how to debug SIGSEGV in jvm GCTaskThread - crash

My application is experiencing cashes in production.
The crash dump indicates a SIGSEGV has occurred in GCTaskThread
It uses JNI, so there might be some source for memory corruption, although I can't be sure.
How can I debug this problem - I though of doing -XX:OnError... but i am not sure what will help me debug this.
Also, can some of you give a concrete example on how JNI code can crash GC with SIGSEGV
EDIT:
OS:SUSE Linux Enterprise Server 10 (x86_64)
vm_info: Java HotSpot(TM) 64-Bit Server VM (11.0-b15) for linux-amd64 JRE (1.6.0_10-b33), built on Sep 26 2008 01:10:29 by "java_re" with gcc 3.2.2 (SuSE Linux)
EDIT:
The issue stop occurring after we disable the hyper threading, any thoughts?

Errors in JNI code can occur in several ways:
The program crashes during execution of a native method (most common).
The program crashes some time after returning from the native method, often during GC (not so common).
Bad JNI code causes deadlocks shortly after returning from a native method (occasional).
If you think that you have a problem with the interaction between user-written native code and the JVM (that is, a JNI problem), you can run diagnostics that help you check the JNI transitions. to invoke these diagnostics; specify the -Xcheck:jni option when you start up the JVM.
The -Xcheck:jni option activates a set of wrapper functions around the JNI functions. The wrapper functions perform checks on the incoming parameters. These checks include:
Whether the call and the call that initialized JNI are on the same thread.
Whether the object parameters are valid objects.
Whether local or global references refer to valid objects.
Whether the type of a field matches the Get<Type>Field or Set<Type>Field call.
Whether static and nonstatic field IDs are valid.
Whether strings are valid and non-null.
Whether array elements are non-null.
The types on array elements.
Pls read the following links
http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=/com.ibm.java.doc.diagnostics.50/html/jni_debug.html
http://www.oracle.com/technetwork/java/javase/clopts-139448.html#gbmtq

Use valgrind. This sounds like a memory corruption. The output will be verbose but try to isolate the report to the JNI library if its possible.

Since the faulty thread seems to be GCTaskThread, did you try enabling verbose:gc and analyzing the output (preferably using a graphical tool like samurai, etc.)? Are you able to isolate a specific lib after examining the hs_err file?
Also, can you please provide more information on what causes the issue and if it is easily reproducible?

Related

JVM step by step simulator

Is there a free JVM implementation that allow to see the content of the different parts of the Java Virtual Machine (e.g., callstack, heap) and execute a program step by step?
Once the JIT compiles the bytecode to native code, the VM registers and stack have little meaning.
I would use your debugger to see what the Java program is doing line by line. The bytecode is for a virtual machine, not an actual one and the JVM doesn't have to follow the VM literally, only what the program does.
The JIT can
use the many registers your CPU has rather than use a pure stack.
inline code rather than perform method calls.
remove code which it determines isn't used.
place objects on the stack.
not synchronize objects which are only used in a local method.
A good tool to see how the code is translated from byte code to machine code is JITWatch

OpenJDK debug with printf?

I am hacking OpenJDK7 to implement an algorithm. In the process of doing this, I need to output debug information to the stdout. As I can see in the code base, all printings are done by using outputStream*->print_cr(). I wonder why printf() was not used at all?
Part of the reasons why I'm asking this because I in fact used a lot of printf() calls. And I have been seeing weird bugs such as random memory corruption and random JVM crashing. Is there any chance that my printf() is the root cause? (Assume that the logic of my code is bug-free of course)
why printf() was not used at all?
Instead of using stdio directly, HotSpot utilizes its own printing and logging framework. This extra abstraction layer provides the following benefits:
Allows printing not only to stdout but to an arbitrary stream. Different JVM parts may log to separate streams (e.g. a dedicated stream for GC logs).
Has its own implementation of formatting and buffering that does not allocate memory or use global locks.
Gives control over all output emitted by JVM. For example, all output can be easily supplemented with timestamps.
Facilitates porting to different platforms and environments.
The framework is further improved in JDK 9 to support JEP 158: Unified JVM Logging.
Is there any chance that my printf() is the root cause?
No, unless printf is misused: e.g. arguments do not match format specifiers, or printf is called inside a signal handler. Otherwise it is safe to use printf for debugging. I did so many times when worked on HotSpot.

Why does LLDB refuse to break on compiled objective C methods?

I have a compiled objective-C binary on iOS 8.1 which I am attempting to debug with lldb on my machine and debugserver on the handset. (No XCode involved- though I am willing to get it involved if that is the issue.)
Ida can correctly recognize the binary as objective-C and decompose objects and component messages. Because of this, I would expect commands like
platform select remote-ios
connect://ip:port
breakpoint set --name "-[Login doLoginStuff]"
to correctly function, but this method is called in code without breaking in lldb.
Is there the need for some type of target call to hint to the debugger what the remote architecture or SDK target is?
Without the symbols I don't believe lldb can map -[Login doLoginStuff] to a memory address. If it cant find the name it fails silently as far as I remember.

Why isn't all the java bytecode initially interpreted to machine code?

I read about Just-in-time compilation (JIT) and as I understood, there are two approaches for this – Interpreter and JIT, both of which interpreting the bytecode at runtime.
Why not just preparatively interprete all the bytecode to machine code, and only then start to run the process with no more need for interpreter?
Another reason for late JIT compiling has to do with optimization: At run-time the VM can detect more/other patterns it may optimize than the compiler could ever do at compile-time. JIT pre-compiling at startup will always have to be static, and the same could have been done by the compiler already, but through analysis of the actual run-time behaviour the VM may have more information on possible optimizations and may therefore produce better optimization results.
For example, the VM can detect that a single piece of code is actually run a million times at run-time and perform appropriate optimizations which the compiler may have no information about, not unlike the branch prediction that's done at runtime in modern CPUs.
More information can be found in the Wikipedia article on "Adaptive optimization".
Simple: Because it takes time to precompile everything to machine code. And users don't want to wait on the application to start. Remember, the precompilation would have to make a lot of optimizations which takes time.
The server version of JVM is more aggressive in precompiling and optimizing code upfront because code on the server side tends to be executed more often and for a longer period of time before the process is shutdown.
However, a solution (for .Net) is an application called NGen which make the precompilation upfront such that it isn't needed after that point. You only have to run that once.
Not all VM's include an interpreter. For instance Chrome and CLR (.Net) always compiles to machine code before running. However, they have multiple levels of optimizations to reduce the startup time.
I found link showing how runtime recompilation can optimize performance and save extra CPU cycles.
Inlining expansion: To decrease the cost of procedure calls.
Removing redundant loads: When 2 compiled code results in some duplicate code then it can be removed and further optimised by recompilation at run time.
Copy propagation
Eliminating dead code
Here is another link for the same explanation given above.

Loader lock detected twain

I am using a TwainPro4.dll for scanning purposes in my VB.net application framework v3.5.
When i run my application i get the below exception, Please advise
LoaderLock was detected
Message: DLL 'C:\WINDOWS\assembly\GAC\PegasusImaging.WinForms.TwainPro4\4.0.22.0__80d669b8b606a2da\PegasusImaging.WinForms.TwainPro4.dll' is attempting managed execution inside OS Loader lock. Do not attempt to run managed code inside a DllMain or image initialization function since doing so can cause the application to hang.
I am assuming you mean when you debug your application you get this message. This message is important to understand. From MSDN:
"The loaderLock managed debugging assistant (MDA) detects attempts to execute managed code on a thread that holds the Microsoft Windows operating system loader lock. Any such execution is illegal because it can lead to deadlocks and to use of DLLs before they have been initialized by the operating system's loader. "
Now, to get the application to run in debug mode you can disabled the LoaderLock MDA in Debug Exceptions interface by Pressing (Ctrl+D, E) then open Managed Debugging Assistants tree and uncheck Loader Lock.
However! This is indicative of the DLL being initialized or written improperly. As such, again from MSDN,:
Typically, several threads inside the process will deadlock. One of those threads is likely to be a thread responsible for performing a garbage collection, so this deadlock can have a major impact on the entire process. Furthermore, it will prevent any additional operations that require the operating system's loader lock, like loading and unloading assemblies or DLLs and starting or stopping threads.
In some unusual cases, it is also possible for access violations or similar problems to be triggered in DLLs which are called before they have been initialized.
You may want to go back to the developer of the DLL and see what their approved resolution is.
Sources
http://msdn.microsoft.com/en-us/library/ms172219
http://msdn.microsoft.com/en-us/library/aa290048%28VS.71%29.aspx