Creating a JVM per JNI_CreateJavaVM, receiving a OutOfMemoryError - jvm

I'm creating a JVM out of a C++-program per JNI, and the creation itself works fine. The communication with the JVM works also fine; I am able to find classes, create objects, call methods and so on. But one of my methods needs quite a lot of memory, and the JVM throws a OutOfMemoryError when calling it. Which I don't understand, as there is more than one GB of free RAM available. The whole process uses about 200MB and it seems that it doesn't even try to allocate more; it sticks at 200MB and then the exceptions is thrown.
I tried to pass the -Xmx-option to the JVM, but it won't work when the JVM is created through JNI. As far as I understood, a JVM created through JNI should be able to access all the memory available, making the -Xmx-options unnecessary - but obviously this assumption is wrong.
So the question is, how can I say the JVM that it just should use as much as memory as it needs?
System: MacOS 10.6
Creation of the JVM:
JNIEnv *env;
JavaVMInitArgs vm_args;
JavaVMOption options;
//Path to the java source code
options.optionString = jvm_options; // setting the classpath
vm_args.version = JNI_VERSION_1_6; //JDK version. This indicates version 1.6
vm_args.nOptions = 1;
vm_args.options = &options;
vm_args.ignoreUnrecognized = 0;
int ret = JNI_CreateJavaVM(jvm, (void**)&env, &vm_args);
if(ret < 0)
printf("\nUnable to Launch JVM\n");

Seems like I got something wrong with the -Xmx-option - tried it again and it works now.

Related

How to debug a native OpenGL crash in managed code?

I am currently writing a game rendering engine using LWJGL 3 and Kotlin. Everything works fine for multiple minutes, until out of nowhere the program exits with the following message:
Process finished with exit code -1073740940 (0xC0000374)
All I do is load a few models, and then render them with glDrawElements(...) in the main loop - nothing else is loaded or changed.
Now, I know that this error code means heap corruption, but I do not even get a hs_err_pid logfile and the Java Debugger just crashes with the program. So how would I go about debugging such a crash? Could this be because of an incompatibility with Kotlin?
So, for everyone who may find themselves in a similar situation: Thanks to the LWJGLX debug tool by Kai Burjack, I instantly found what crashed the program.
What I did was the following: In the shader class, when uploading a matrix, I allocated a managed FloatBuffer, which I then accidentally tried to free manually:
val buf = BufferUtils.createFloatBuffer(16)
matrix.get(buf)
glUniformMatrix4fv(location, false, buf)
MemoryUtil.memFree(buf)
The MemoryUtil.memFree() call actually doesn't crash, but as the matrix changes every frame, this method corrupts the heap over time - hence the crash after a few minutes.
I attached the LWJGLX debugger and now my program crashed instantly - with a precise error message telling me that I am trying to free a Memory Region i did not allocate with memAlloc() - So after changing my code to
val buf = MemoryUtil.memAllocFloat(16)
...
MemoryUtil.memFree(buf)
everything now works. I can only recommend the LWGLX debugger - It also found some memory leaks I now have to fix ;-)

Memory Leak on UWP MediaPlayer (Windows.Media.Playback.MediaPlayer)

I am maintenancing a WPF application. I added a UWP nedia player on my project. But, memory usage is too high. I realized that UWP media player did it, so I created a reproducible code.
while (true)
{
var mp = new MediaPlayer()
{
Source = MediaSource.CreateFromUri(new Uri("Test.mp4"))
};
Thread.Sleep(1000);
mp.Play();
Thread.Sleep(1000);
mp.Dispose();
}
This code occurs a memory leak. I created MediaPlayer and Disposed it! But, its memory usage grows up infinitely.
How can I catch memory leak on this code?
This is .NET Core 3.0 project. (XAML islands with WPF) I didn't test that if it occurs in pure UWP project, yet.
Someone says that it is natural because it is a loop. But, below code doesn't make any memory leak because GC works. (Of course, some (but limitative) references will be not collected.)
while (true)
{
new SomeClass();
}
It is absolutely a bug of Windows 10 19H1. Because built-in app (Movie and TV) has same memory leak issue. To reproduce this, just repeat that open video file and close it.
The way your code is written memory will bloat and grow until you run out of memory. I verified also in pure UWP. If you make the following two changes you will find that the memory will remain stable and the system will reclaim all memory after each loop:
Dispose also of the MediaSource object you create and assign to the Source property
Don't run this in a tight loop, instead invoke yourself as a dispatcher action
Here is the code (tested in UWP) that doesn't show any leak. In WPF the Dispatcher call would look slightly different:
private async void PlayMedia()
{
var ms = MediaSource.CreateFromUri(new Uri("ms-appx:///Media1.mp4"));
var mp = new MediaPlayer()
{
Source = ms
};
Thread.Sleep(1000);
mp.Play();
Thread.Sleep(1000);
mp.Dispose();
ms.Dispose();
await Dispatcher.RunAsync(CoreDispatcherPriority.Normal, new DispatchedHandler(PlayMedia));
}
As a side note: the "SomeClass" comparison you mentioned isn't exactly an apples-to-apples to comparison if SomeClass is a pure managed code class, as the objects you are creating here are complex native Windows Runtime objects that only have a thin managed code wrapper around them.
Tested also now in WPF: I reproduced the original memory growth issue, then applied the suggested changes and verified that the memory no longer grows. Here is my test project for reference: https://1drv.ms/u/s!AovTwKUMywTNuLk9p3frvE-U37saSw
Also I ran your shared solution with the WPF app packaged as a Windows App Package and I am not seeing a leak on the latest released version of Windows 10 (17763.316). Below is a screenshot of the memory diagnostics after running your solution for quite a while. If this is specific to the insider build you are running, please log a bug via Feedback Hub. I think at this point we should close this question as answered.

JVM Memory Segments and JIT Compiler

I know this is JVM dependent and every virtual machine would choose to implement it a little bit different yet I want to understand the overall concept.
It has been said that for the Memory Segments that the JVM uses to execute a Java program
Java Stacks
Heap
Method Area
PC Registers
Native Method Stacks
are not necessarily implemented with contiguous memory and may be all actually allocated on some heap memory provided from the OS, this leads me to my question.
JVM's that fully use the JIT mechanism and compiles bytecode methods
into native machinecode methods store these methods somewhere, where
would that be? the execution engine ( that is usually written in C /
C++ ) would have to invoke these JIT compiled functions, yet the kernel shouldn't allow a program to execute code saved on the stack / heap / static memory segment, how could the JVM overcome this?
Another question I have is regarding the Java stacks, when a method ( after JIT compilation ) is executed within the processor it's local variables should be saved within the Java stacks, yet again the Java stacks may be implemented with a non-contiguous memory and perhaps even just some stack data structure allocated on the heap acting as a stack, how and where do the local variables of a method being executed get saved? the kernel shouldn't allow a program to treat a heap allocated memory as a process stack, how does JVM overcome this difficuly as well?
Again, I want to emphasis that I'm asking for an overall concept, I know each JVM would choose to implement this a little bit different...
JVM's that fully use the JIT mechanism and compiles bytecode methods into native machinecode methods store these methods somewhere, where would that be?
It is stored in the "Perm Gen" in Java <= 7 and "meta space" in Java 8. This is another native memory region.
the execution engine ( that is usually written in C / C++ ) would have to invoke these JIT compiled functions, yet the kernel shouldn't allow a program to execute code saved on the stack / heap / static memory segment, how could the JVM overcome this?
The memory region is both writable and executable, though I don't know exactly which system call is required to implement this.
Another question I have is regarding the Java stacks, when a method ( after JIT compilation )
Initially the code is not compiled but it uses the stack in the same way.
is executed within the processor it's local variables should be saved within the Java stacks, yet again the Java stacks may be implemented with a non-contiguous memory
There is a stack per thread which is continuous.
and perhaps even just some stack data structure allocated on the heap acting as a stack, how and where do the local variables of a method being executed get saved?
On the thread stack.
the kernel shouldn't allow a program to treat a heap allocated memory as a process stack, how does JVM overcome this difficuly as well?
It doesn't do this.

What can cause different ObjC/ARC memory behaviour between Release and Debug configuration?

I was running a test to make sure objects are being deallocated properly by wrapping the relevant code section in a 10 second long while loop. I ran the test in Debug and Release configurations with different results.
Debug (Build & Run in simulator):
Release (Build & Run on device, and Profile using Instruments):
The CPU spikes signify where objects are created and destroyed (there's 3 in each run). Notice how in the Debug build, the memory usage rises gradually during the busy loop, and then settles a little afterwards at a higher base level, this happens with each loop iteration. On the Release build it stays constant the whole time. At the end after 3 runs the memory usage level of the Debug build is significantly higher than that of the Release build. (The CPU spikes are offset on the time axis relative to each other but that's just because I pressed the button that triggers the loop at different times).
The inner loop code in question is very simple and basically consists of a bunch of correctly paired malloc and free statements as well as a bunch retain and release calls (courtesy of ARC, also verified as correctly paired).
Any idea what is causing this behaviour?
In Release builds ARC will do its best to keep objects out of the autorelease pool. It does this using objc_returnsRetainAutorelease and checking for it at runtime.
A lot of Cocoa-Touch classes use caching to improve performance. Memory amount used for caching data could vary depending on total memory, available memory and probably some other things. Since you compare results for Mac and device it is not strange that you receive different results.
Some examples of classes/methods that use caching:
+(UIImage *)imageNamed:(NSString *)name
Discussion
This method looks in the system caches for an image object with the specified name and
returns that object if it exists. If a matching image object is not
already in the cache, this method loads the image data from the
specified file, caches it, and then returns the resulting object.
NSURLCache
The NSURLCache class implements the caching of responses to
URL load requests by mapping NSURLRequest objects to
NSCachedURLResponse objects. It provides a composite in-memory and
on-disk cache
For one thing, the release builds optimize code and remove debugging information from the code. As a result, the application package is significantly smaller and to load it, less memory is necessary.
I suppose that most of the used memory in Debug builds is the actual debugging information, zombie tracking etc.

How to add memory to the heap at runtime?

I am using Keil's ARM-MDK 4.11. I have a statically allocated block of memory that is used only at startup. It is used before the scheduler is initialised and due to the way RL-RTX takes control of the heap-management, cannot be dynamically allocated (else subsequent allocations after the scheduler starts cause a hard-fault).
I would like to add this static block as a free-block to the system heap after the scheduler is initialised. It would seem that __Heap_ProvideMemory() might provide the answer, this is called during initialisation to create the initial heap. However that would require knowledge of the heap descriptor address, and I can find no documented method of obtaining that.
Any ideas?
I have raised a support request with ARM/Keil for this, but they are more interested in questioning why I would want to do this, and offering alternative solutions. I am well aware of the alternatives, but in this case if this could be done it would be the cleanest solution.
We use the Rowley Crossworks compiler but had a similar issue - the heap was being set up in the compiler CRT startup code. Unfortunately the SDRAM wasn't initialised till the start of main() and so the heap wasn't set up properly. I worked around it by reinitialising the heap at the start of main(), after the SDRAM was initialised.
I looked at the assembler code that the compiler uses at startup to work out the structure - it wasn't hard. Subsequently I have also obtained the malloc/free source code from Rowley - perhaps you could ask Keil for their version?
One method I've used is to incorporate my own simple heap routines and take over the malloc()/calloc()/free() functions from the library.
The simple, custom heap routines had an interface that allowed adding blocks of memory to the heap.
The drawback to this (at least in my case) was that the custom heap routines were far less sophisticated than the built-in library routines and were probably more prone to fragmentation than the built-in routines. That wasn't a serious issue in that particular application. If you want the capabilities of the built-in library routines, you could probably have your malloc() defer to the built-in heap routines until it returns a failure, then try to allocate from your custom heap.
Another drawback is that I found it much more painful to make sure the custom routines were bug-free than I thought it would be at first glance, even though I wasn't trying to do anything too fancy (just a simple list of free blocks that could be split on allocation and coalesced when freed).
The one benefit to this technique is that it's pretty portable (as long as your custom routines are portable) and doesn't break if the toolchain changes it's internals. The only part that requires porting is taking over the malloc()/free() interface and making sure you get initialized early enough.