What is the need of JVM when you can pass the source code? - jvm

i am new to java.
i wanted to know this.
what is the need to create the .class file in java ?
can't we just pass the source code to every machine so that each machine can compile it according to the OS and the hardware ?

I believe it's mostly for efficiency reasons.
From wikipedia http://en.wikipedia.org/wiki/Bytecode:
Bytecode, also known as p-code (portable code), is a form of
instruction set designed for efficient execution by a software
interpreter. Unlike human-readable source code, bytecodes are compact
numeric codes, constants, and references (normally numeric addresses)
which encode the result of parsing and semantic analysis of things
like type, scope, and nesting depths of program objects. They
therefore allow much better performance than direct interpretation of
source code.
(my emphasis)
And as others have mentioned possible weak obfuscation of the source code.

The main reason for the compilation is that the Virtual Machines which are used to host java classes and run them only understands bytecode
And since compiling a class each time to the language the virtual machine understands is expensive. That's the only reason why the source code is compiled into bytecode.
But we can also use some compilers which compiles source code directly into machine code.But that's a different story which I don't know about much.

Related

Can different file extension executables be disassembled into the same instruction set OpCode?

This is a question from someone clueless about disassembly and decompiling in general, so bear with me. I am curious to know if executable file extensions (for example, listed in http://pcsupport.about.com/od/tipstricks/a/execfileext.htm ) can be disassembled into assembly language so then I can analyze opcode patterns across files.
My logic is that once all these different file extensions are in opcode form, they are all on the same level, regardless of language barriers, etc, so it would be easier to analyze them.
How feasible is this?
EDIT: Example. I have an .exe file and an .app file. If I disassembled both, could I compare them across opcode on the same OS? If not, how about executable files from the same OS. For example, for all executable files on Windows, if I disassembled both, could I compare opcode across each?
EDIT2: How will obfuscators affect my efforts?
In short, no.
The problem is that there is no practical universal instruction set. In practice, every computer architecture has its own instruction set (or sometimes several instruction sets). A native executable format like .exe is compiled to the machine's instruction set, which will differ based on the ISA targeted.
I'm not familiar with the .app format, but it appears to be some sort of archive containing executable code. So if you have an exe and app targeting the same ISA, you could conceivably diassemble and compare.
Obfuscation makes things much harder because it is difficult to get a reliable disassembly, let alone deal with stuff like self modifying code.

In which language is the proto compiler (of google protocol buffers) written?

I would like to know in which language the "proto compiler" (the compiler used to generate source files from Java, Python or c++) is written? Is it maybe a mix of languages?
Any help would be appreciated.
Thanks in Advance
Horace
It appears to be written in C++. There's also documentation on Java and Python APIs, but those don't appear to contain the compiler itself (at least I don't see anything that's obviously the compiler in either case, though I didn't spend a whole lot of time looking for it either).
That said, I'm almost tempted to vote to close -- for most practical purposes, the language used to implement the compiler is basically a trivia question, irrelevant to actual use. There is, however, an entirely legitimate exception: if you're going to download and modify the compiler, knowing the language you'd need to work with could be quite useful.
The protoc compiler is written in C or C++ (its a native program anyway).
When I want to process proto files in java files, I
I use the protoc command to convert them to a Protocol Buffer File ie
protoc protofile.proto --descriptor_set_out=OutputFile
Read the new protocol buffer file (its a FileDescriptorSet) and use it
An over complicated example is example, is compileProto method in
http://code.google.com/p/protobufeditor/source/browse/trunk/%20protobufeditor/Source/ProtoBufEditor/src/net/sf/RecordEditor/ProtoBuf/re/display/ProtoLayoutSelection.java
its compilcated because options because the protoc command and options can be stored in a properties file.
Note: The getFileDescriptor method reads the newly created protocol buffer

What converts vbscript to machine code?

Compiled languages like C# and java, have just in time compilers, that convert them (from byte code) into machine code (0s and 1s). How does an interpreted language like VBScript get converted into machine code? Is it done by the operating system?
They don't necessarily get converted to machine code (and often don't).
The interpreter for that program runs the appropriate actions according to what the program requires.
Some interpreters might generate machine code (using JIT compilers), others might stick to plain interpretation of the script.
I know this is old, but given that I can't comment (rep), I want to add a clarifying answer:
An interpreter is used to interpret the script (be it VBScript, javascript, python, or any other script) into individual instructions. These instructions can be in the form of machine code or an intermediate representation (that the OS or other program can use). Some interpreters are made for something closer to assembly language and the source code is more or less executed directly.
Most modern scripting languages (eg, Python, Perl, Ruby) are interpreted to an intermediate representation, or to an intermediate representation and into compiled (aka machine, aka object) code. The important distinction (vs compiled languages) is that an interpreter isn't taking an entire body of code and translating its meaning to machine code, it's taking each line at a time and interpreting its meaning as a standalone unit.
Think of this as the difference between translating an entire essay from English to Russian (compiled code) vs taking each sentence in the essay and translating it directly (interpreted code). You may get a similar effect, but the result won't be identical. More importantly, translating an entire essay as a total body of work takes a lot more effort than doing one sentence at a time as a standalone unit, but the whole translation will be much easier for Russian speakers to read than the rather clunky sentence-by-sentence version. Hence the tradeoff between compiling code vs interpreting code.
Source: https://en.wikipedia.org/wiki/Interpreter_(computing), experience
This is the answer I was looking for. Like javascript engine, there used to be a vbscript engine, that converted human readable code to machine code. This vbscript engine is analogous to the JIT compiler in CLR and JVM. Only that it converts directly from human readable code to machine code. As opposed to C# having an intermediate byte code.
Referring to this VB Script wikipedia article,
When VB script is executed in a browser it uses vbscript.dll to interpret VB script.
When VB script file is executed from command-line or a batch file then cscript.exe is used to interpret VB script.
When VB script is used by Windows OS itself for various purposes like showing error message boxes or yellow colored notification messages in the right corner of the task bar then it is interpreted using wscript.exe which is a windows service.

Are all scripts written in scripting languages?

I'm confused by the concept of scripts.
Can I say that makefile is a kind of script?
Are there scripts written in C or Java?
I'd refer to Wikipedia for a detailed explanation.
"Scripts" usually refer to a piece of code or set of instructions that run in the context of another program. They usually aren't a standalone executable piece of software.
Makefiles are a script that is run by "make", or MSBuild, etc.
C needs to be compiled into an executable or a library, so programs written in (standard) C would typically not be considered scripts. (There are exceptions, but this isn't the normal way of working with C.)
Java (and especially .net) is a bit different. A typical java program is compiled and run as an executable, but this is a grey area. It is possible to do runtime compilation of a "script" written in java and execute it.
In a very general sense the term "Scripts" relates to code that is deployed and expected to run from the lexical representation. As soon as you compile the code and distribute the resulting output instead of the code it ceases to be a "Script".
Minification and obsfication of a script is not consided a compile and the result is still consider a script.
It depends on your definition of script. For me, a script could be any small program you write for a small purpose. They are usually written in interpreted languages. However, there's nothing stopping you from writing a small program in a compiled language.
For me a script has to consist of a single file. And that file must be able to perform the task for which the script was written with no intermediate steps.
So these would be OK:
bash backup_my_home_dir.sh
perl munge_some_text.pl
python download_url.py
But this wouldn't qualify, even if the file is small:
javac HandyUtility.java
java HandyUtility
Yes it's possible to do scripting in Java. I've seen it many times :)
(this was sarcasm for bad spaghetti code)
The term 'scripting' can cover a fairly broad spectrum of activities. Examples being programming in imperative interpreted languages such as VBScript, Python, or shell scripts such as csh or bash, or expressing a task in declaritive languages such as XSL, SQL or Erlang.
Some scripting languages fall into a category referred to as Domain Specific Languages (DSL's). Good examples of DSL's are 'makefile's, many other types of configuration files, SQL, XSL and so on.
What you're asking is fairly subjective, one man's script is another man's application. If your interpretation of scripting means that using scripting languages should not force a user to follow the traditional compile -> link -> run cycle, then you could form the opinion that you can't write 'scripts' in C or Java.
A script is basically a non-compilable text file in almost any language, or shell, with an interpreter that is used to automate some process, or list of commands, that you perform repeatedly. Scripts are often used for backing up files, compiling routines, svn commits, shell initialization, etc., ad infinitum. There are a million and one things you can do with a script that an executable (complete with installation, etc.) would simply be overkill for.
I write scripts in F#. A recent one is a small data loader to take in some set of data, do a bit of processing to it, and dump it in a DB. ~40 lines. No separate compilation step needed; I can just make F# Interactive run it directly.
Benefit is that I get a fully powered language with a great IDE and all the safety static checking provides, while inference makes it not get verbose like say, Java or C#.
So, that's one language that offers a reasonably decent type system, compilation and checking, isn't interpreteded, but works fine for scripting.

Refactoring dissassembled code

You write a function and, looking at the resulting assembly, you see it can be improved.
You would like to keep the function you wrote, for readability, but you would like to substitute your own assembly for the compiler's. Is there any way to establish a relationship between your high-livel language function and the new assembly?
If you are looking at the assembly, then its fair to assume that you have a good understanding about how code gets compiled down. If you have this knowledge, then its sometimes possible to 'reverse enginer' the changes back up into the original language but its often better not to bother.
The optimisations that you make are likely to be very small in comparison to the time and effort required in first making these changes. I would suggest that you leave this kind of work to the compiler and go have a cup of tea. If the changes are significant, and the performance is critical, (as say in the embedded world) then you might want to mix the normal code with the assemblar in some fashion, however, on most computers and chips the performance is usually sufficient to avoid this headache.
If you really need more performance, then optimise the code not the assembly.
None, I suppose. You've rejected the compiler's work in favor of your own. You might as well throw out the function you wrote in the compiled language, because now all you have is your assembler in that platform.
I would highly advise against engaging in this kind of optimization because unless you're sure, via profiling and analysis, that you truly are making a difference.
It depends on the language you wrote your function in. Some languages like C are very low-level, translating each function call or statement to specific assembly statements. If you did use C, you can replace your function with inline assembly to improve performance.
Other high-level languages may convert each statement into macro routines or other more complex calls on the assembly side. Certain optimizations (like tail recursion, loop unrolling, etc) can be implemented easily on the source side, but others (like making more efficient use of the register file) may be impossible (again, depending on the language and the compiler you're using).
Its tough to say there is any relationship between modified assembly and the source which generated the unmodified version. It will certainly confuse debugging tools: register contents will no longer match the source variables they were supposed to correspond to.
There are a number of places in packet processing code where I've examined the generated assembly and gone back to change the original source code in order to improve the result. Re-arranging source can reduce the number of branches, __attribute__ and compiler arguments can align branch points and functions to reduce I$ misses. In desperate cases a little inline assembly can be used, so that the binary can still be compiled from source.
Something you could try is to separate your original function into its own file, and provide a make rule to build the assembler from there. Then update the assembler file with your improved version, and provide a make rule to build an object file from the assembler file. Then change your link rules to include that object file.
If you only ever change the assembler file, that will keep on being used. If you ever change the original higher-level language file, the assembler file will be rebuilt and the object file built from the new (unimproved) version.
This gives you a relationship between the two; you probably want to add a warning comment at the top of the higher-level language file to warn about the behaviour. Using some form of VCS will give you the ability to recover the improved assembler file if you make a mistake here.
If you're writing a native compiled app in Visual C++, there are two methods:
Use the __asm { } block and write your assembler in there.
Write your functions in MASM assembler, assemble to .obj, and link it as an static library. In your C/C++ code, declare the function with an extern "C" declaration.
Other C/C++ compilers have similar approaches.
In this situation, you generally have two options: optimize the code or rewrite the compiler. I can't see where breaking the link between source and op is ever going to be the correct solution.