How can I extend JVM bytecode? - jvm

I want to create a programming language that compiles to its own bytecode format and a VM that interprets it. But I want the bytecode to be compatible with JVM. I've searched for any way to insert comments to JVM bytecode so then I can parse them with my own VM but I couldn't find any. Also I tried to insert some bytes in the start of the byte array and in the end, but it produced ClassFormatException. Is there any workaround?

"compiles to its own bytecode format" and "compatible with JVM" are mutually exclusive requirements.
If you want the JVM to be able to parse your classfiles, the classfiles must strictly comply with the Java Virtual Machine Specification Chapter 4. The class File Format.
The standard way of exteding the class file format is Attributes. You may invent your own attributes and include them to one or more of the following structures:
ClassFile
field_info
method_info
Code

Scala does this, so you should check that out. As mentioned by apanging, the standard way to insert "comments" in a classfile are with attributes. The classfile format defines certain standard attributes that are used by the JVM. However, you can also include arbitrary user defined attributes which can contain arbitrary data. This is what Scala uses to include the metadata required by the Scala runtime.

Related

systemverilog module namespaces

I am combining two designs into a single chip design. The RTL code is written in SystemVerilog for synthesis. Unfortunately, the two designs contain a number of modules with identical names but slightly different logic.
Is there a namespace or library capability in SystemVerilog that would allow me to specify different modules with the same name? In other words is there a lib1::module1, lib2::module1 syntax I could use to specify which module I want? How is this sort of module namespace pollution best handled?
Thanks
Look into config and library. See IEEE Std 1800-2017 § 33. Configuring the contents of a design
library will map this files to target libraries based on file paths (IEEE Std 1800-2017 § 33.3. Libraries)
config will map which library to use for paralytic module (global, instances, subscope) (IEEE Std 1800-2017 § 33.4. Configurations)
Examples are provided in the section 33.8.
Note: some simulators want -libmap <configfile> in the command line. Refer to your simulators manual.
Unfortunately, neither verilog nor system verilog provide a comprehensive solution for the namespaces problem for design element (which include modules). V2K libraries and config statements (yes,they were introduced in verilog v2k) can partially help you solving this issue for modules only, and only if you plan for this in advance and use correct methodology to implement it. Not many people try to use v2k libs to solve it.
There are other parts of this as well, which you might discover. It include other design elements, macro names, file names, package names, ... System verilog makes it even worse with introducing of the global scopes.
So, depending on the complexity of your design you might be able to fix it with v2k libs. But in general, the solution always lies in the methodology and having those names uniquified upfront. Some companies even try to use on-fly uniquification by automatically rewriting verilog code in order to make those names unique.
You might also be able to solve some of the issues like that using compilation units, as defined in the SV standard and which are implemented at least by major tool vendors.

Can I set a maximum value for a number in protobuf?

In protobuf, we only have the choice of using signed or unsigned 32- or 64-bit integer to limit the range of a value.
However, the datastructure I want to define contains a mixture of 8-bit, 16-bit and 32-bit integers to save space on embedded devices. On them, the datastructure is also implemented somewhat differently and requires reserved special values for some fields, so the maximum number for them is not a power of 2.
On these embedded devices, the protobuf definition is only used for transmission to and from them, not for actual storage. So I could just limit the numbers when reading them in.
However, I'd rather define these maximum values in the .proto or .options file to make sure all client applications are aware of these limitations.
Is there a way to do this?
I know there are field options, but the ones listed here does not include an option for this. It is possible to create custom options, but that seems to require writing a compiler extension, which means I have to manually implement this limit checking for every language I want to compile to, and that costs more time than it will ever save.
This is not possible in protobuf by default, and the specification includes no syntax to enforce limits like this.
However, some third party implementations do include such support.
For example, my own nanopb has the int_size option:
int_size: Override the integer type of a field. (To use e.g. uint8_t to save RAM.)
This will return an error at runtime from pb_decode() if the value does not fit in the field.
No there is no syntax for expressing that intent and no inbuilt tool / codegen that will enforce the rule you want to add. You would need to handle this manually.

Implementing Java serialization independent encoding

I want to use several encodings in the presentation layer to encode a object/structure in the application layeri independenty from encoding scheme (such as binary, XML, etc) and programming language (Java, Javascript, PHP, C).
An example would be to transfer an object from a producer to a consumer in a byte stream. The Java client would encode it using something like this:
Object var = new Dog();
output.writeObject(var);
The server would share the Dog class definitions and could regenerate the object doing something like this:
Object var = input.readObject();
assertTrue(var instanceof Dog); // passes
It is important to note that producer and consumer would not share the type of var, and, therefore, the consumer would not need the type to decode var. They only would share data type definitions, if ever:
public interface Pojo {}
public class Dog implements Pojo { int i; String s; } // Generated by framework from a spec
What I found:
Java Serialization: It is language dependent. Cannot be used with for example javascript.
Protobuf library: It is limited to a specific binary format. It is not possible to support additional binary formats. Need name of class ("class" of message).
XStream, Simple, etc. They are rather limited to text/XML and require name of the class.
ASN.1: The standards are there and could be used with OBJECT IDENTIFIER and type definitions but they lack on documentation and tutorials.
I prefer 4th option because, among others, it is a standard. Is there any active project that support such requirements (specially something based on ASN.1)? Any usage example? Does the project include codecs (DER, BER, XER, etc.) that can be selected at runtime?
Thanks
You can find several open source and commercial implementation of tools for ASN.1. These usually include:
a compiler for the schema, which will generate code in your desired programming language
a runtime library which is used together with the generated code for encoding and decoding
ASN.1 is mainly used with the standardized communication protocols for telecom industry, so the commercial tools have very good support for the ASN.1 standard and various encoding rules.
Here are some starter tutorials and even free e-books:
http://www.oss.com/asn1/resources/asn1-made-simple/introduction.html
http://www.oss.com/asn1/resources/reference/asn1-reference-card.html
http://www.oss.com/asn1/resources/books-whitepapers-pubs/asn1-books.html
I know that the OSS ASN.1 commercial tools (http://www.oss.com/asn1/products/asn1-products.html) will support switching the encoding rules at runtime.
To add to bosonix's answer, there's also Objective System's tools at http://www.obj-sys.com/. The documentation from both OSS and Objective Systems includes many example uses.
ASN.1 is pretty much perfect for what you're looking for. I know of no other serialisation system that does this quite so thoroughly.
As well as a whole array of different binary encodings (ranging from the comprehensively tagged BER all the way down to the very packed-together PER), it does XML and now also JSON encodings too. These are well standardised by the ITU, so it is in theory fully inter operable between tool vendors, programming languages, OSes, etc.
There are other significant benefits to ASN.1. The schema language lets you define constraints on the value of message fields, or the sizes of arrays. These then get checked for you by the generated code. This is far more complete than many other serialisations. For instance, Google Protocol Buffers doesn't let you do this, meaning that you have to check the range of message fields (where applicable) in hand written code. That's tedious, error prone, and hard to maintain.
The only other ones that do this are XSD and JSON schemas. However with those you're at the mercy of the varying quality of tools used to turn those into source code - I've not yet seen any decent ones for JSON schemas. I'm not aware of whether or not Microsoft's xsd.exe honours such constraints either.

What is the need of JVM when you can pass the source code?

i am new to java.
i wanted to know this.
what is the need to create the .class file in java ?
can't we just pass the source code to every machine so that each machine can compile it according to the OS and the hardware ?
I believe it's mostly for efficiency reasons.
From wikipedia http://en.wikipedia.org/wiki/Bytecode:
Bytecode, also known as p-code (portable code), is a form of
instruction set designed for efficient execution by a software
interpreter. Unlike human-readable source code, bytecodes are compact
numeric codes, constants, and references (normally numeric addresses)
which encode the result of parsing and semantic analysis of things
like type, scope, and nesting depths of program objects. They
therefore allow much better performance than direct interpretation of
source code.
(my emphasis)
And as others have mentioned possible weak obfuscation of the source code.
The main reason for the compilation is that the Virtual Machines which are used to host java classes and run them only understands bytecode
And since compiling a class each time to the language the virtual machine understands is expensive. That's the only reason why the source code is compiled into bytecode.
But we can also use some compilers which compiles source code directly into machine code.But that's a different story which I don't know about much.

What does (filename.java.i, filename.jar.i) extension mean

I have files named xxx.java.i,xxx.java.d,xxx.jar.i. I know that these file are somehow related to Java. What does this extension mean and for what is it used? Is it same type as the .class extension?
You should look at your build system for more information. It is possible that these are intermediate files that get transformed and renamed to ".java". For example, I've seen various build systems that use the ".i" suffix to mean "input", and perform various forms of variable substitution (e.g. changing something like "{VERSION_NUMBER}" to the version number of the library being compiled).
I think they are created by someone to serve his own purpose and unless we ask the author or see the content we won't know what it the purpose is.
If you see garbled characters, it's probably java bytecode and you can use some decompiler to see the code (see: How do I "decompile" Java class files?).