How to properly assign huge heap space for JVM - jvm

Im trying to work around an issue which has been bugging me for a while. In a nutshell: on which basis should one assign a max heap space for resource-hogging application and is there a downside for tit being too large?
I have an application used to visualize huge medical datas, which can eat up to several gigabytes of memory if several imaging volumes are opened size by side. Caching the data to be viewed is essential for fluent workflow. The software is supported with windows workstations and is started with a bootloader, which assigns the heap size and launches the main application. The actual memory needed by main application is directly proportional to the data being viewed and cannot be determined by the bootloader, because it would require reading the data, which would, ultimately, consume too much time.
So, to ensure that the JVM has enough memory during launch we set up xmx as large as we dare based, by current design, on the max physical memory of the workstation. However, is there any downside to this? I've read (from a post from 2008) that it is possible for native processes to hog up excess heap space, which can lead to memory errors during runtime. Should I maybe also sniff for free virtualmemory or paging file size prior to assigning heap space? How would you deal with this situation?
Oh, and this is my first post to these forums. Nice to meet you all and be gentle! :)
Update:
Thanks for all the answers. I'm not sure if I put my words right, but my problem rose from the fact that I have zero knowledge of the hardware this software will be run on but would, nevertheless, like to assign as much heap space for the software as possible.
I came to a solution of assigning a heap of 70% of physical memory IF there is sufficient amount of virtual memory available - less otherwise.

You can have heap sizes of around 28 GB with little impact on performance esp if you have large objects. (lots of small objects can impact GC pause times)
Heap sizes of 100 GB are possible but have down sides, mostly because they can have high pause times. If you use Azul Zing, it can handle much larger heap sizes significantly more gracefully.
The main limitation is the size of your memory. If you heap exceeds that, your application and your computer will run very slower/be unusable.
A standard way around these issues with mapping software (which has to be able to map the whole world for example) is it break your images into tiles. This way you only display the image which is one the screen (or portions which are on the screen) If you need to be able to zoom in and out you might need to store data at two to four levels of scale. Using this approach you can view a map of the whole world on your phone.

Best to not set JVM max memory to greater than 60-70% of workstation memory, in some cases even lower, for two main reasons. First, what the JVM consumes on the physical machine can be 20% or more greater than heap, due to GC mechanics. Second, the representation of a particular data entity in the JVM heap may not be the only physical copy of that entity in the machine's RAM, as the OS has caches and buffers and so forth around the various IO devices from which it grabs these objects.

Related

mongodb high cpu usage

I have installed MongoDB 2.4.4 on Amazon EC2 with ubuntu 64 bit OS and 1.6 GB RAM.
On this server, only MongoDB running nothing else.
But sometime CPU usage reach to 99% and load average: 500.01, 400.73,
620.77
I have also installed MMS on server to monitor what's going on server.
Here is MMS detail
As per MMS details, indexing working perfectly for each queries.
Suspect details as below
1) HIGH non-mapped virtual memory
2) HIGH page faults
Can anyone help me to understand what exactly causing high CPU usage ?
EDIT:
After comments of #Dylan Tong, i have reduced active connetions but
still there is high non-mapped virtual memory
Here's a summary of a few things to look into:
1. Observed a large number of connections and cursors (13k):
- fix: make sure your connection pool is appropriate. For reporting, and your current request rate, you only need a few connections at most. Also, I'm guessing you have a m1small instance, which means you only have 1 core.
2. Review queries and indexes:
- run your queries with explain(), to observe how the queries are executed. The right model normally results in queries only pulling very few documents and utilization of an index.
3. Memory (compact and readahead setting):
- make the best use of memory. 1.6GB is low. Check how much free memory you have, and compare it to what is reported as resident. A couple of common causes of low resident memory is due to fragmentation. If there are alot of documents moving, changing size and such, you should run the compact command to defragment your data files. Also, a bad readahead can lead to poor use of memory as well. Check your readahead setting (http://manpages.ubuntu.com/manpages/lucid/man2/readahead.2.html). Try a few values starting with low values (http://docs.mongodb.org/manual/administration/production-notes/). The production notes recommend 32 (for standard 512byte blocks). Sometimes higher values are optimal if your documents are larger. The hope is that resident memory should be close to your available memory and your page faults should start to lower.
If you're using resources to the fullest after this, and you're still capped out on CPU then it means you need to up your resources.

Is it meaningful to monitor physical memory usage on AIX?

Due to AIX's special memory-using algorithm, is it meaning to monitor the physical memory usage in order to find out the memory bottleneck during performance tuning?
If not, then what kind of KPI am i supposed to keep eyes on so as to determine whether we need to enlarge the RAM capacity or not?
Thanks
If a program requires more memory that is available as RAM, the OS will start swapping memory sections to disk as it sees fit. You'll need to monitor the output of vmstat and look for paging activity. I don't have access to an AIX machine now to illustrate with an example, but I recall the man page is pretty good at explaining what data is represented there.
Also, this looks to be a good writeup about another AIX specfic systems monitoring tool, and watching your systems overall memory (svgmon).
http://www.aixhealthcheck.com/blog.php?id=255
To track the size of your individual application instance(s), there are several options, with the most common being ps. Again, you'll have to check the man page to get information on which options to use. There are several columns for memory sz per process. You can compare those values to the overall memory that's available on your machine, and understand, by tracking over time, if your application is only increasing is memory, or if it releases memory when it is done with a task.
Finally, there's quite a body of information from IBM on performance tuning for AIX, but I was never able to find a road map guide to reading that information. A lot of it assumes you know facts and features that aren't explained in the current doc set, so you then have to try and find an explanation, which oftens leads to searching for yet another layer of explanations. ! :^/
IHTH.

Heap profiling on ARM

I am developing a GUI-heavy C++ application on a Freescale MX51-based board Linux 2.6.35. I would like to perform heap profiling.
Unfortunately, all heap profiling tools I have found have either been too intrusive or ostensibly non-working on ARM. Specific tools I've tried:
Valgrind Massif: unworkable on my platform due to the platform's feeble CPU. The 80% CPU time overhead introduced by Massif causes a range of problems in my application that cannot be compensated for.
gperftools (formerly Google Performance Tools) tcmalloc: All features of this rather un-intrusive, library-based libc malloc() replacement work on my target except for the heap profiler. To rephrase, the thread caching allocator works but the profiler does not. I'll explain the failure mode of the profiler below for anyone curious.
Can anyone suggest a set of replacement tools for performing C++ heap profiling on ARM platforms? Ideal output would ultimately be a directed allocation graph, similar to what gperftools' tcmalloc outputs. Low resource utilization is a must- my platform is highly resource constrained.
Failure mode of gperftools' tcmalloc explained:
I'm providing this information only for those that are curious; I do not expect a response. I'm seeing something similar to gperftools' issue #407 below, except on ARM rather than x86.
Specifically, I always get the message "Hooked allocator frame not found, returning empty trace." I spent some time debugging the issue and it appears that, when dynamically linking the tcmalloc library, frame pointers at the boundary between my application and the dynamic library are null- the stack cannot be walked "above" the call into the dynamic library.
gperftools issue #407: https://github.com/gperftools/gperftools/issues/410
stackoverflow user seeing similar problems on ARM: Missing frames on shared libraries on ARM
Heaps. Many ways to do them, but I've only run across 3 main types that matter in embedded land:
Linked list heaps. Each alloc is tracked in a "used" list. Once freed, they are dropped into a "free" list. On freeing, adjacent blocks of free memory are "joined" into larger pieces. Allocs can be any size. Each alloc and free is a O(N) op as it has to traverse the free list to give you a piece of memory plus break the free block into a size close to what you asked for while leaving the remaining block in the free list. Because of the increasing overhead per alloc, this system cannot be used by itself on smaller systems. This also tends to cause memory fragmentation over time if steps aren't taken to minimize it.
Fixed size (unit) heaps. You break your heap into equal size (smaller) parts. This wastes memory a bit, depending on how big the chunks are (and how many different sized, fixed allocator heaps you create), but alloc and free are both O(1) time operations. No searching, no joining. This style is often combined with the first one for "small object allocations" as the engines I've worked with have 95% of their allocations below a set size (say 256 bytes). This way, you use the unit heap for small allocs for huge speed and only minimal memory loss, while using the list heap for larger allocs. No external fragmentation of memory either.
Relocatable memory heaps. You don't give out pointers to memory, but handles. That way, behind the scenes, you can change memory pointers when needed to remove fragmentation or whatever. High overhead. High pain the the #$$ quotient as it's easy to abuse and get dangling pointer all over. Also added overhead for each memory dereference. But wanted to mention it.
There's some basic patterns. You can find all sorts of libs out in the wild that use them and also have built in statistics for number of allocs, fragmentation, and other useful stats. It's also not the hard to roll your own really, though I'd not recommend it for anything outside of satisfying curiosity as debugging without a working malloc is painful indeed. Adding thread support is pretty straightforward as well, but again, downloading a ready made solution is the better choice.
The above info applies to all platforms, ARM or otherwise, though most of my experience has been on low level ARM stuff so the above info is battle tested for your platform. Hope this helps!

Physical Memory and Virtual Memory data allocation behavior

Im interested in understanding how a computer allocates variables for physical memory vs files in virtual memory ( such as on a hard drive ), in terms of how does the computer determine know where to put data. It almost seems random in both memory storage types, but its not because it simply can't put data at a memory address or sector (any location) of a hard drive that's occupied or allocated for another process already. When I was studying how Norton's speed disk ( a program that de-fragments files on hard drives ) on my old W95 system, I noticed from the program's representation of hard drive's data ( a color coded visual map of different data types, e.g. swap files were always first at the top.), consisting of many files spread out all over the hard drive with empty unused areas. In addition some of these areas, I saw what looked like a mix of data and empty space showed a spotty pattern. I want to think its random for that to happen. Like wise, when I was studying the memory addresses of a simple program I wrote in C, I noticed that each version of my program after recompiling it after changes - showed different addresses for segments and offsets. I was expecting the computer to use the same address when I recompiled it. Sometimes the same address would be used, other times it was different. Again, I want to think its random also for memory locations to be chosen by programs. I thought that memory allocation or file writing was based on the first empty space available, written in a contiguous manner.
So my question is, I want to know how and what is it in the logic works of a common computer, that decides where it writes its data in such a arbitrary manner for either type of location (physical RAM or Dynamic )? What area of computer science (if not assembly language) would I need to study that would explains this, almost random behavior?
Thanks in Advance
Something broader and directly from computer science would be a linked list. http://en.wikipedia.org/wiki/Linked_list
Imagine if you had a linked list and simply added items to the end, these items might live linearly in memory or disk or whatever somewhere. But as you remove some items in the middle of the list by having say item number 7 point at item number 9 eliminating item number 8. As with memory allocation for allocs or virtual memory or hard drive sector allocation, etc how fast you fragment your storage has to do with the algorithm you use for allocating the next item.
file systems can/do use a link list type scheme to keep track of what sectors are tied to a single file. it is fast and easy to use the link list but deal with fragmentation. A much slower method would be to have no fragmentation but be constantly copying/moving files around to keep them on linear sectors.
malloc() allocation schemes and MMU allocation schemes also fall under this category. Basically any time you take something, slice it up into fractions and put a virtual interface in front of those fractions to give the appearance to the programmer/user that they are linear. Malloc() (not counting the virtual memory via the MMU) is the other way around allocating a number of linear chunks of those fractions to meed the alloc need, and having an alloc/free scheme that attempts to keep as many large chunks available, just in case, a bad malloc system is one where you have half of your memory free but the maximum malloc that works without an out of memory error is a malloc of a small fraction of that memory, say you have a gig free and can only allocate 4096 bytes.
You should look at virtual memory and TLB (translation lookaside buffer) or paging.
It is not trivial to implement virtual memory and paging. The performance of your whole system depends on it. If it's not done properly your system will thrash.
It is early morning here so Wikipedia will have to do for now: https://en.m.wikipedia.org/wiki/Translation_lookaside_buffer
EDIT:
Those coloured spots you saw in your defrag were chunks on your HDD. Each chunk is of some specified size. Depending on how fragmented your HDD is, you might have portions of your HDD that look like this:
*-*-***-***-*
where * means full, and - means empty
This (above) could be part of one application/file or multiple files; I will assume one file is split across those to simplify my example. At the end of each * there is a pointer to the next location where the next * chunk is (this is called a linked list). The more fragmented your HDD is (or memory) the more of these pointers to next chunk you will have. This in turn uses more space for next pointers instead of using space for data and the result is more overhead when reading that data. If this is a file on disk, you will have multiple seeks (which are bad because they're slow) if your data is not grouped together (locality principle). When you use defrag, it moves and groups all chunks together (as best as it can).
*-*-***-***-*
becomes
*********----
The OS decides paging and virtual memory addressing (and such). TLB is a hardware (a cache) that aids this process (it maps physical memory to virtual memory addresses for fast look up). The CPU communicates with the TLB via MMU
To answer your questions
You should study operating systems.
Yes the locations where to place your files on HDD are decided by the OS. If you deleted a file and download it again, there is no guarantee it will be placed in the same location-most likely not.
A nice summary of all these components and principles I mentioned here work: Click Here. It's a ppt with slides from a Real Time Operating Systems book (if I'm not mistaken the same exact one I used)

How does memory use affect battery life?

How does memory allocation affect battery usage? Does holding lots of data in variables consume more power than performing many iterations of basic calculations?
P.S. I'm working on a scientific app for mac, and want to optimize it for battery consumption.
The amount of data you hold in memory doesn't influence the battery life as the complete memory has to be refreshed all the time, whether you store something there or not (the memory controller doesn't know whether a part is "unused", AFAIK).
By contrast, calculations do require power. Especially if they might wake up the CPU from an idle or low power state.
I believe RAM consumption is identical regardless of whether it's full or empty. However more physical RAM you have in the machine the more power it will consume.
On a mac, you will want to avoid hitting the hard drive, so try to make sure you don't read the disk very often and definitely don't consume so much RAM you start using virtual memory (or push other apps into virtual memory).
Most modern macs will also partially power down the CPU(s) when they aren't very busy, so reducing CPU usage will actually reduce power consumption.
On the other hand when your app uses more memory it pushes other apps cache data out of the memory and the processing can have some battery cost if the user decides to switch from one to the other, but that i think will be negligible.
it's best to minimize your application's memory footprint once it transitions to the background simply to allow more applications to hang around and not be terminated. Also, applications are terminated in descending order of memory size, so if your application is the largest one existing in the background, it will be killed first.