How to implement branching in bison-based interpreter? - virtual-machine

Now I develop a virtual machine. Bytecode interpreter uses flex and bison.
Here is some code for example:
some:
add r0 4 4
jmp some
My question is: how to handle jmp instruction?
Can I ask the bison to go back to a label, continuing the analysis?
I develop interpreter of bytecode, not compiler...

No, you can't make bison go back. You usually Bison to parse the code and generate some kind of intermediate representation. Like an AST or bytecode. Then you execute that in a separate step.
So in your case, since you're parsing an assembly language for a bytecode format, it makes sense to translate that into actual bytecode. That is when your parser sees "add r0 4 4", all it should do is append the corresponding sequence of bytes to an array containing your bytecode. Then after the parser creates this array, you can pass it to a function that actually executes the bytecode.
It would probably also make sense to split these two steps into two separate programs: an assembler that turns a source file into a binary bytecode file, and a bytecode interpreter that reads a bytecode file and executes it. The latter would not need to use Bison at all, just read the bytes and switch on them.

Related

What is a Interpreter to be exact?

According to Wikipedia an interpreter uses at least one of the following Strategies:
Parse the source code and perform its behavior directly;
Translate source code into some efficient intermediate representation or object code and immediately execute that;
Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.
So is a program that reads code and executes it directly an interpreter? Does an interpreter need to convert code into binary? Does a compiler need to convert code into binary?
So is a program that reads code and executes it directly an interpreter?
Yes. By definition, an interpreter reads code, then performs what the code tells it to do. Unlike an interpreter, a compiler reads the code then makes an executable file that can be run later.
Does an interpreter need to convert code into binary?
Not always. An interpreter may just read the input code then perform what the code tells it to do, but another type of interpreters use JIT Compilation. Interpreters that use JIT Compilation turn the input code into machine code, but do not make an executable file. Instead, they run the code in memory then throw it away after it has been run. JIT Compilation can be faster than traditional interpreters.
Does a compiler need to convert code into binary?
Yes. In order to create an executable file, a compiler must first read the input code then turn it into something the computer can understand (machine code). This first step is just like JIT Compilation. Unlike JIT Compilation, compilers do not run the machine code it produces, and does not throw it away. Instead, it writes it to a file (called an executable file, or just executable) in a specific format for the OS it is being compiled on. This specific format is why Windows programs cannot run on Linux, and vice-versa.

how to compile perl6 program to generate bytecode?

I am trying to understand perl6 and its changes than perl5. I come to know that perl 6 is compiled languages but I am not getting how? It is not generating any intermediate code (directly executable or jvm bytecode)?
I am not getting any option to do the same. How to do it?
Currently I am able to directly execute my code.
$ perl6-j hello.p6
Hello world
I am following https://github.com/rakudo/rakudo
You can use --target= on the perl6 command line to see a human readable trace of each stage of the compiler. On JVM if you wish to have a "compiled" bytecode output you can use --target=jar and then take a look inside there. But ultimately Perl 6 compiles on the fly unless asked otherwise. It leaves a bytecode representation cached in library path directories of each "CompUnit", so that the compile step is faster next time. This can be seen in .precomp directories. The precomp cache is very tricky to use by hand due to how Perl 6 hashes and indexes all comp units. This is so libraries with the same name but different version and author can sit side by side. On MoarVM there is no equivalent to --target=jar but in the .precomp directory you can see the raw bytecode files that can be directly executed by moar if you link the runtime setting.
Updating the answer for this as this is now supported.
To generate the bytecode for a perl6 program, run perl6 --target=<backend> --output=foo foo.pl6. You can use mbc, jvm, or js as your target backend. The bytecode will be written to the file foo.
Writing bytecode to a file both for modules and programs is not official supported yet. Hence the lack of documentation for --target.

What is the difference in byte code like Java bytecode and files and machine code executables like ELF?

What are the differences between the byte code binary executables such as Java class files, Parrot bytecode files or CLR files and machine code executables such as ELF, Mach-O and PE.
what are the distinctive differences between the two?
such as the .text area in the ELF structure is equal to what part of the class file?
or they all have headers but the ELF and PE headers contain Architecture but the Class file does not
Java Class File
Elf file
PE File
Byte code is, as imulsion noted, an intermediate step, right before compilation into machine code. Because the last step is left to load time (and often runtime, as is the case with Just-In-Time (JIT) compilation, byte code is architecture independent: The runtime (CLR for .net or JVM for Java) is responsible for mapping the byte code opcodes to their underlying machine code representation.
By comparison, native code (Windows: PE, PE32+, OS X/iOS: Mach-O, Linux/Android/etc: ELF) is compiled code, suited for a particular architecture (Android/iOS: ARM, most else: Intel 32-bit (i386) or 64-bit). These are all very similar, but still require sections (or, in Mach-O parlance "Load Commands") to set up the memory structure of the executable as it becomes a process (Old DOS supported the ".com" format which was a raw memory image). In all the above, you can say , roughly, the following:
Sections with a "." are created by the compiler, and are "default" or expected to have default behavior
The executable has the main code section, usually called "text" or ".text". This is native code, which can run on the specific architecture
Strings are stored in a separate section. These are used for hard-coded output (what you print out) as well as symbol names.
Symbols - which are what the linker uses to put together the executable with its libraries (Windows: DLLs, Linux/Android: Shared Objects, OS X/iOS: .dylibs or frameworks) are stored in a separate section. Usually there is also a "PLT" (Procedure Linkage Table) which enables the compiler to simply put in stubs to the functions you call (printf, open, etc), that the linker can connect when the executable loads.
Import table (in Windows parlance.. In ELF this is a DYNAMIC section, in OS X this is a LC_LOAD_LIBRARY command) is used to declare additional libraries. If those aren't found when the executable is loaded, the load fails, and you can't run it.
Export table (for libraries/dylibs/etc) are the symbols which the library (or in Windows, even an .exe) can export so as to have others link with.
Constants are usually in what you see as the ".rodata".
Hope this helps. Really, your question was vague..
TG
Byte code is a 'halfway' step. So the Java compiler (javac) will turn the source code into byte code. Machine code is the next step, where the computer takes the byte code, turns it into machine code (which can be read by the computer) and then executes your program by reading the machine code. Computers cannot read source code directly, likewise compilers cannot translate immediately into machine code. You need a halfway step to make programs work.
Note that ELF binaries don't necessarily need to be machine/arch specific per se.
The interesting piece is the "interpreter" header field: it holds a path name to a loader program that's executed instead of the actual binary. This one then is responsible for loading the actual program, loading and linking libraries, etc. This is the way how eg. ld.so comes in.
Theoretically one could create an ELF binary that holds java bytecode (or a complete jar). This just needs some appropriate "interpreter" program which starts up a JVM and loads the code from the binary into it.
Not sure whether this actually has been done before, but certainly possible.
The same can be done w/ quite any non-native code.
It also could serve for direct multiarch support via some VM like qemu:
Let the target platform (libc+linker scripts) put the arch name into the interpreter program name (eg. /lib/ld.so.x86_64, /lib/ld.so.armhf, ...).
Then, on a particular arch (eg. x86_64), the one with native arch name will point to the original ld.so, while the others point to some special one that calls up something like qemu-system-XXX.

In which language is the proto compiler (of google protocol buffers) written?

I would like to know in which language the "proto compiler" (the compiler used to generate source files from Java, Python or c++) is written? Is it maybe a mix of languages?
Any help would be appreciated.
Thanks in Advance
Horace
It appears to be written in C++. There's also documentation on Java and Python APIs, but those don't appear to contain the compiler itself (at least I don't see anything that's obviously the compiler in either case, though I didn't spend a whole lot of time looking for it either).
That said, I'm almost tempted to vote to close -- for most practical purposes, the language used to implement the compiler is basically a trivia question, irrelevant to actual use. There is, however, an entirely legitimate exception: if you're going to download and modify the compiler, knowing the language you'd need to work with could be quite useful.
The protoc compiler is written in C or C++ (its a native program anyway).
When I want to process proto files in java files, I
I use the protoc command to convert them to a Protocol Buffer File ie
protoc protofile.proto --descriptor_set_out=OutputFile
Read the new protocol buffer file (its a FileDescriptorSet) and use it
An over complicated example is example, is compileProto method in
http://code.google.com/p/protobufeditor/source/browse/trunk/%20protobufeditor/Source/ProtoBufEditor/src/net/sf/RecordEditor/ProtoBuf/re/display/ProtoLayoutSelection.java
its compilcated because options because the protoc command and options can be stored in a properties file.
Note: The getFileDescriptor method reads the newly created protocol buffer

What converts vbscript to machine code?

Compiled languages like C# and java, have just in time compilers, that convert them (from byte code) into machine code (0s and 1s). How does an interpreted language like VBScript get converted into machine code? Is it done by the operating system?
They don't necessarily get converted to machine code (and often don't).
The interpreter for that program runs the appropriate actions according to what the program requires.
Some interpreters might generate machine code (using JIT compilers), others might stick to plain interpretation of the script.
I know this is old, but given that I can't comment (rep), I want to add a clarifying answer:
An interpreter is used to interpret the script (be it VBScript, javascript, python, or any other script) into individual instructions. These instructions can be in the form of machine code or an intermediate representation (that the OS or other program can use). Some interpreters are made for something closer to assembly language and the source code is more or less executed directly.
Most modern scripting languages (eg, Python, Perl, Ruby) are interpreted to an intermediate representation, or to an intermediate representation and into compiled (aka machine, aka object) code. The important distinction (vs compiled languages) is that an interpreter isn't taking an entire body of code and translating its meaning to machine code, it's taking each line at a time and interpreting its meaning as a standalone unit.
Think of this as the difference between translating an entire essay from English to Russian (compiled code) vs taking each sentence in the essay and translating it directly (interpreted code). You may get a similar effect, but the result won't be identical. More importantly, translating an entire essay as a total body of work takes a lot more effort than doing one sentence at a time as a standalone unit, but the whole translation will be much easier for Russian speakers to read than the rather clunky sentence-by-sentence version. Hence the tradeoff between compiling code vs interpreting code.
Source: https://en.wikipedia.org/wiki/Interpreter_(computing), experience
This is the answer I was looking for. Like javascript engine, there used to be a vbscript engine, that converted human readable code to machine code. This vbscript engine is analogous to the JIT compiler in CLR and JVM. Only that it converts directly from human readable code to machine code. As opposed to C# having an intermediate byte code.
Referring to this VB Script wikipedia article,
When VB script is executed in a browser it uses vbscript.dll to interpret VB script.
When VB script file is executed from command-line or a batch file then cscript.exe is used to interpret VB script.
When VB script is used by Windows OS itself for various purposes like showing error message boxes or yellow colored notification messages in the right corner of the task bar then it is interpreted using wscript.exe which is a windows service.