Why isn't all the java bytecode initially interpreted to machine code? - jvm

I read about Just-in-time compilation (JIT) and as I understood, there are two approaches for this – Interpreter and JIT, both of which interpreting the bytecode at runtime.
Why not just preparatively interprete all the bytecode to machine code, and only then start to run the process with no more need for interpreter?

Another reason for late JIT compiling has to do with optimization: At run-time the VM can detect more/other patterns it may optimize than the compiler could ever do at compile-time. JIT pre-compiling at startup will always have to be static, and the same could have been done by the compiler already, but through analysis of the actual run-time behaviour the VM may have more information on possible optimizations and may therefore produce better optimization results.
For example, the VM can detect that a single piece of code is actually run a million times at run-time and perform appropriate optimizations which the compiler may have no information about, not unlike the branch prediction that's done at runtime in modern CPUs.
More information can be found in the Wikipedia article on "Adaptive optimization".

Simple: Because it takes time to precompile everything to machine code. And users don't want to wait on the application to start. Remember, the precompilation would have to make a lot of optimizations which takes time.
The server version of JVM is more aggressive in precompiling and optimizing code upfront because code on the server side tends to be executed more often and for a longer period of time before the process is shutdown.
However, a solution (for .Net) is an application called NGen which make the precompilation upfront such that it isn't needed after that point. You only have to run that once.
Not all VM's include an interpreter. For instance Chrome and CLR (.Net) always compiles to machine code before running. However, they have multiple levels of optimizations to reduce the startup time.

I found link showing how runtime recompilation can optimize performance and save extra CPU cycles.
Inlining expansion: To decrease the cost of procedure calls.
Removing redundant loads: When 2 compiled code results in some duplicate code then it can be removed and further optimised by recompilation at run time.
Copy propagation
Eliminating dead code
Here is another link for the same explanation given above.

Related

Is there a way to turn off JIT compiler and is there a performance impact by doing so?

What does it mean for a Java program to be JIT'ed and does it make the execution a lot more faster? Or are there bytecodes which are not JIT'ed?
There is two ways to disable the JIT
-Djava.compiler=NONE
or this will almost never compile anything
-XX:CompileThreshold=2000000000
or on IBM JVM
-nojit
Disabling the JIT can slow down your code a lot e.g. 50x but not always. If you spend most of your time doing IO or GUI updates you might find it makes little difference.
From the doc,
-Xint
Runs the application in interpreted-only mode. Compilation to native code is disabled, and all bytecode is executed by the interpreter.
The java.compiler system property is known by Compiler class before Java 9. In Java 9 it's marked as #Deprecated.
For IBM the correct option is -Xnojit or -Xint
In HotSpot VM JIT compilation can be disabled with the option -XX:-UseCompiler which skips compiler threads initialization.
It is true by default:
product(bool, UseCompiler, true,
"Use Just-In-Time compilation")

What happens if an MPI process crashes?

I am evaluating different multiprocessing libraries for a fault tolerant application. I basically need any process to be allowed to crash without stopping the whole application.
I can do it using the fork() system call. The limit here is that the process can be created on the same machine, only.
Can I do the same with MPI? If a process created with MPI crashes, can the parent process keep running and eventually create a new process?
Is there any alternative (possibly multiplatform and open source) library to get the same result?
As reported here, MPI 4.0 will have support for fault tolerance.
If you want collectives, you're going to have to wait for MPI-3.something (as High Performance Mark and Hristo Illev suggest)
If you can live with point-to-point, and you are a patient person willing to raise a bunch of bug reports against your MPI implementation, you can try the following:
disable the default MPI error handler
carefully check every single return code from your MPI programs
keep track in your application which ranks are up and which are down. Oh, and when they go down they can never get back. but you're unable to use collectives anyway (see my opening statement), so that's not a huge deal, right?
Here's an old paper (back when Bill still worked at Argonne. I think it's from 2003):
http://www.mcs.anl.gov/~lusk/papers/fault-tolerance.pdf . It lays out the kinds of fault tolerant things one can do in MPI. Perhaps such a "constrained MPI" might still work for your needs.
If you're willing to go for something research quality, there's two implementations of a potential fault tolerance chapter for a future version of MPI (MPI-4?). The proposal is called User Level Failure Mitigation. There's an experimental version in MPICH 3.2a2 and a branch of Open MPI that also provides the interfaces. Both are far from production quality, but you're welcome to try them out. Just know that since this isn't in the MPI Standard, the function prefixes are not MPI_*. For MPICH, they're MPIX_*, for the Open MPI branch, they're OMPI_* (though I believe they'll be changing theirs to be MPIX_* soon as well.
As Rob Latham mentioned, there will be lots of work you'll need to do within your app to handle failures, though you don't necessarily have to check all of your return codes. You can/should use MPI error handlers as a callback function to simplify things. There's information/examples in the spec available along with the Open MPI branch.

SIGSEGV in optimizated ifort

If I compile with -O0 in ifort, the program can run correctly. But as long as I open the optimization option, like -O, -O3, -fast, there will be a SIGSEGV segmentation fault come out.
This error occurred in a subroutine named maketable(). And the belows are the phenomenons:
(1) I call fftw library in this subroutine. If I comment the sentences about fftw, it'll be ok. But I think it's not the fault of fftw, because I also use fftw in some other places of this code, and they are good.
(2) the fftw is called in a loop, and the loop can run several times when the program crashed. The segfault does not happen at the first time of entering the loop.
(3) I considered the stack overflow, but I don't think so now. I have the executable file complied by others long time ago, it's can be executed in my computer. I think that suggests it's not due to the system stack overflow.
The version of ifort is 10.0, of fftw is fftw-2.1.5. The cpu type is intel xeon 5130. Thanks a lot.
There are two common causes of segmentation faults in Fortran programs:
Attempting to access an element outside the bounds of an array.
Mismatching actual and dummy arguments in a procedure call.
Both are relatively easy to find:
Your compiler will have an option to generate code which performs array bounds checking at run time. Check your compiler documentation, rebuild your code and rerun it. If this is the cause of the problem you will get an error message identifying where your code goes awry.
Program explicit interfaces for any subroutines and functions in your program, or use modules so that the compiler generates such interfaces for you, or use a compiler option (see the documentation) to check that argument types match at compile-time.
It's not unusual that such errors (seem to) arise only when optimisation is turned up high.
EDIT
Note that I'm not suggesting that optimisation causes the error you observe, but that it causes the error to affect the execution of your program and become evident.
It's not unknown for incorrect programs to run many times apparently without fault only for, say, recompilation with a new compiler version to create an executable which crashes every time.
Your wish to switch off optimisation only for the subroutine where the segmentation fault seems to arise is, I suggest, completely wrong-headed. I expect my programs to execute correctly at any level of optimisation (save for clear evidence of a compiler bug, such things are not unknown). I think that by turning off optimisation you are sweeping a real problem with your program under the carpet, as it were.

Linux process activities

Is there possibility to show what's going on under specified process in Linux?
For example, i run SQL query -> select evil_function();
and notice that process under Linux uses all cpu.
So is there something with what I can see whats going on under this process?
What I want is to see what queries is running under this process.
Thanks!
strace will tell you what system calls the process is making.
To see what called routines are taking the most CPU, you need to run a profiling tool, and make sure the executable of the process you in compiled correctly (sometimes it needs to be instrumented during compilation for profiling, sometimes it just needs to be compiled with debug symbols, or not stripped of them after compilation).
You might want to look at oprofile, valgrind, gprof and for starters on free tools - there are also commercial products available.
Here are a few links:
http://www.pixelbeat.org/programming/profiling/
http://en.wikipedia.org/wiki/List_of_performance_analysis_tools
You are mixing a whole bunch of things.
If you are talking about MySQL do:
show processlist;
For info specifically about linux processes, you can strace the process to get a list of system function that it calls. Unless you are experienced with linux this will be useless to you.
If the process is paused then you can find out what function it is stopped on, but that's probably not what you want, since you say the process is running.
There are also various tools that can give you info on what parts of the disk the process is reading, and how much memory it's allocating.
And finally you can use gdb to break into the process and single step your way through it to see exactly what it's doing. This will also likely be useless to you since an SQL server does a LOT of things - far to many to understand by this method.

Any Macro or Technic for Part Optimization?

I am working on lock free structure with g++ compiler. It seems that with -o1 switch, g++ will change the execution order of my code. How can I forbid g++'s optimization on certain part of my code while maintain the optimization to other part? I know I can split it to two files and link them, but it looks ugly.
If you find that gcc changes the order of execution in your code, you should consider using a memory barrier. Just don't assume that volatile variables will protect you from that issue. They will only make sure that in a single thread, the behavior is what the language guarantees, and will always read variables from their memory location to account for changes "invisible" to the executing code. (e.g changes to a variable done by a signal handler).
GCC supports OpenMP since version 4.2. You can use it to create a memory barrier with a special #pragma directive.
A very good insight about locking free code is this PDF by Herb Sutter and Andrei Alexandrescu: C++ and the Perils of Double-Checked Locking
You can use a function attribute "__attribute__ ((optimize 0))" to set the optimization for a single function, or "#pragma GCC optimize" for a block of code. These are only for GCC 4.4, though, I think - check your GCC manual. If they aren't supported, separation of the source is your only option.
I would also say, though, that if your code fails with optimization turned on, it is most likely that your code is just wrong, especially as you're trying to do something that is fundamentally very difficult. The processor will potentially perform reordering on your code (within the limits of sequential consistency) so any re-ordering that you're getting with GCC could potentially occur anyway.