According to Wikipedia an interpreter uses at least one of the following Strategies:
Parse the source code and perform its behavior directly;
Translate source code into some efficient intermediate representation or object code and immediately execute that;
Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.
So is a program that reads code and executes it directly an interpreter? Does an interpreter need to convert code into binary? Does a compiler need to convert code into binary?
So is a program that reads code and executes it directly an interpreter?
Yes. By definition, an interpreter reads code, then performs what the code tells it to do. Unlike an interpreter, a compiler reads the code then makes an executable file that can be run later.
Does an interpreter need to convert code into binary?
Not always. An interpreter may just read the input code then perform what the code tells it to do, but another type of interpreters use JIT Compilation. Interpreters that use JIT Compilation turn the input code into machine code, but do not make an executable file. Instead, they run the code in memory then throw it away after it has been run. JIT Compilation can be faster than traditional interpreters.
Does a compiler need to convert code into binary?
Yes. In order to create an executable file, a compiler must first read the input code then turn it into something the computer can understand (machine code). This first step is just like JIT Compilation. Unlike JIT Compilation, compilers do not run the machine code it produces, and does not throw it away. Instead, it writes it to a file (called an executable file, or just executable) in a specific format for the OS it is being compiled on. This specific format is why Windows programs cannot run on Linux, and vice-versa.
Related
I am trying to understand perl6 and its changes than perl5. I come to know that perl 6 is compiled languages but I am not getting how? It is not generating any intermediate code (directly executable or jvm bytecode)?
I am not getting any option to do the same. How to do it?
Currently I am able to directly execute my code.
$ perl6-j hello.p6
Hello world
I am following https://github.com/rakudo/rakudo
You can use --target= on the perl6 command line to see a human readable trace of each stage of the compiler. On JVM if you wish to have a "compiled" bytecode output you can use --target=jar and then take a look inside there. But ultimately Perl 6 compiles on the fly unless asked otherwise. It leaves a bytecode representation cached in library path directories of each "CompUnit", so that the compile step is faster next time. This can be seen in .precomp directories. The precomp cache is very tricky to use by hand due to how Perl 6 hashes and indexes all comp units. This is so libraries with the same name but different version and author can sit side by side. On MoarVM there is no equivalent to --target=jar but in the .precomp directory you can see the raw bytecode files that can be directly executed by moar if you link the runtime setting.
Updating the answer for this as this is now supported.
To generate the bytecode for a perl6 program, run perl6 --target=<backend> --output=foo foo.pl6. You can use mbc, jvm, or js as your target backend. The bytecode will be written to the file foo.
Writing bytecode to a file both for modules and programs is not official supported yet. Hence the lack of documentation for --target.
i am new to java.
i wanted to know this.
what is the need to create the .class file in java ?
can't we just pass the source code to every machine so that each machine can compile it according to the OS and the hardware ?
I believe it's mostly for efficiency reasons.
From wikipedia http://en.wikipedia.org/wiki/Bytecode:
Bytecode, also known as p-code (portable code), is a form of
instruction set designed for efficient execution by a software
interpreter. Unlike human-readable source code, bytecodes are compact
numeric codes, constants, and references (normally numeric addresses)
which encode the result of parsing and semantic analysis of things
like type, scope, and nesting depths of program objects. They
therefore allow much better performance than direct interpretation of
source code.
(my emphasis)
And as others have mentioned possible weak obfuscation of the source code.
The main reason for the compilation is that the Virtual Machines which are used to host java classes and run them only understands bytecode
And since compiling a class each time to the language the virtual machine understands is expensive. That's the only reason why the source code is compiled into bytecode.
But we can also use some compilers which compiles source code directly into machine code.But that's a different story which I don't know about much.
What are the differences between the byte code binary executables such as Java class files, Parrot bytecode files or CLR files and machine code executables such as ELF, Mach-O and PE.
what are the distinctive differences between the two?
such as the .text area in the ELF structure is equal to what part of the class file?
or they all have headers but the ELF and PE headers contain Architecture but the Class file does not
Java Class File
Elf file
PE File
Byte code is, as imulsion noted, an intermediate step, right before compilation into machine code. Because the last step is left to load time (and often runtime, as is the case with Just-In-Time (JIT) compilation, byte code is architecture independent: The runtime (CLR for .net or JVM for Java) is responsible for mapping the byte code opcodes to their underlying machine code representation.
By comparison, native code (Windows: PE, PE32+, OS X/iOS: Mach-O, Linux/Android/etc: ELF) is compiled code, suited for a particular architecture (Android/iOS: ARM, most else: Intel 32-bit (i386) or 64-bit). These are all very similar, but still require sections (or, in Mach-O parlance "Load Commands") to set up the memory structure of the executable as it becomes a process (Old DOS supported the ".com" format which was a raw memory image). In all the above, you can say , roughly, the following:
Sections with a "." are created by the compiler, and are "default" or expected to have default behavior
The executable has the main code section, usually called "text" or ".text". This is native code, which can run on the specific architecture
Strings are stored in a separate section. These are used for hard-coded output (what you print out) as well as symbol names.
Symbols - which are what the linker uses to put together the executable with its libraries (Windows: DLLs, Linux/Android: Shared Objects, OS X/iOS: .dylibs or frameworks) are stored in a separate section. Usually there is also a "PLT" (Procedure Linkage Table) which enables the compiler to simply put in stubs to the functions you call (printf, open, etc), that the linker can connect when the executable loads.
Import table (in Windows parlance.. In ELF this is a DYNAMIC section, in OS X this is a LC_LOAD_LIBRARY command) is used to declare additional libraries. If those aren't found when the executable is loaded, the load fails, and you can't run it.
Export table (for libraries/dylibs/etc) are the symbols which the library (or in Windows, even an .exe) can export so as to have others link with.
Constants are usually in what you see as the ".rodata".
Hope this helps. Really, your question was vague..
TG
Byte code is a 'halfway' step. So the Java compiler (javac) will turn the source code into byte code. Machine code is the next step, where the computer takes the byte code, turns it into machine code (which can be read by the computer) and then executes your program by reading the machine code. Computers cannot read source code directly, likewise compilers cannot translate immediately into machine code. You need a halfway step to make programs work.
Note that ELF binaries don't necessarily need to be machine/arch specific per se.
The interesting piece is the "interpreter" header field: it holds a path name to a loader program that's executed instead of the actual binary. This one then is responsible for loading the actual program, loading and linking libraries, etc. This is the way how eg. ld.so comes in.
Theoretically one could create an ELF binary that holds java bytecode (or a complete jar). This just needs some appropriate "interpreter" program which starts up a JVM and loads the code from the binary into it.
Not sure whether this actually has been done before, but certainly possible.
The same can be done w/ quite any non-native code.
It also could serve for direct multiarch support via some VM like qemu:
Let the target platform (libc+linker scripts) put the arch name into the interpreter program name (eg. /lib/ld.so.x86_64, /lib/ld.so.armhf, ...).
Then, on a particular arch (eg. x86_64), the one with native arch name will point to the original ld.so, while the others point to some special one that calls up something like qemu-system-XXX.
Compiled languages like C# and java, have just in time compilers, that convert them (from byte code) into machine code (0s and 1s). How does an interpreted language like VBScript get converted into machine code? Is it done by the operating system?
They don't necessarily get converted to machine code (and often don't).
The interpreter for that program runs the appropriate actions according to what the program requires.
Some interpreters might generate machine code (using JIT compilers), others might stick to plain interpretation of the script.
I know this is old, but given that I can't comment (rep), I want to add a clarifying answer:
An interpreter is used to interpret the script (be it VBScript, javascript, python, or any other script) into individual instructions. These instructions can be in the form of machine code or an intermediate representation (that the OS or other program can use). Some interpreters are made for something closer to assembly language and the source code is more or less executed directly.
Most modern scripting languages (eg, Python, Perl, Ruby) are interpreted to an intermediate representation, or to an intermediate representation and into compiled (aka machine, aka object) code. The important distinction (vs compiled languages) is that an interpreter isn't taking an entire body of code and translating its meaning to machine code, it's taking each line at a time and interpreting its meaning as a standalone unit.
Think of this as the difference between translating an entire essay from English to Russian (compiled code) vs taking each sentence in the essay and translating it directly (interpreted code). You may get a similar effect, but the result won't be identical. More importantly, translating an entire essay as a total body of work takes a lot more effort than doing one sentence at a time as a standalone unit, but the whole translation will be much easier for Russian speakers to read than the rather clunky sentence-by-sentence version. Hence the tradeoff between compiling code vs interpreting code.
Source: https://en.wikipedia.org/wiki/Interpreter_(computing), experience
This is the answer I was looking for. Like javascript engine, there used to be a vbscript engine, that converted human readable code to machine code. This vbscript engine is analogous to the JIT compiler in CLR and JVM. Only that it converts directly from human readable code to machine code. As opposed to C# having an intermediate byte code.
Referring to this VB Script wikipedia article,
When VB script is executed in a browser it uses vbscript.dll to interpret VB script.
When VB script file is executed from command-line or a batch file then cscript.exe is used to interpret VB script.
When VB script is used by Windows OS itself for various purposes like showing error message boxes or yellow colored notification messages in the right corner of the task bar then it is interpreted using wscript.exe which is a windows service.
I am writing in fortran and compiling using the g95 compiler.
I need to have a log file output to a DLL i am writing, that is currently linking and running with the master program, but producing incorrect results. I don't know much about FORTRAN, but i did get the following code to produce output in an EXE i compiled:
OPEN(UNIT=3, FILE='LOG.txt', STATUS='NEW')
WRITE(3,*) "the gospel of PTP is bestowed upon the file."
CLOSE(3)
this works in a stand alone EXE, when i run it, it produces a file with the string inside. But when i try to include it in the DLL i am working on, it crashes everything. when i comment it back out, everything runs and works again, but obviously doesn't produce the desired output.
Any ideas? Any FORTRAN or g95 people?
A guess which might help, or might not, I have rarely used Fortran DLLs to write anything directly:
To where do you expect the DLL to write the file 'LOG.txt' ? Is it perhaps trying to write into a location it is forbidden to write to ? Why that would crash your program I'm not very sure, but it's something for you to check. I expect that you ran the EXE version of your code from one of your user directories.
And, a comment:
In general avoid single-digit unit numbers in Fortran. Most o/s use them for stdout, stderr, etc, and while there are usual assignments (eg stdout is usually 5 I think, and stderr 6) these are not defined in the Fortran standard and compiler-writers are free to use unit numbers as they see fit.