What is the optimal freeheap to totalheap ratio? At what values of this ratio should I consider increasing the heap size/ decreasing the heap size?
The ideal momentary ratio is 1. Ideally, your JVM would consume exactly the memory it required, no more and no less. That's a very hard target to reach ;)
The problem (as TNilsson points out) is that your application's memory requirements change over time as it does work, so you want it to have enough space not to cause constant collection/compaction more often than you can tolerate, and you want it to consume little enough space that you don't have to buy more RAM.
There is no single easy answer, let me give you two examples:
Example 1 - Your program allocates 100M worth of memory at startup, and then does not allocate any memory what so ever for the rest of its run.
In this case, you clearly want to have a heap size of 100M (Well, perhaps 101 or something, but you get the point...) to avoid wasting space.
Example 2 - Your program allocates 10M of memory per second. None of the data is persisted longer than 1 second. (e.g. you are doing a calculation that requires a lot of temporary data, and will return a single integer when you are done...)
Knowing the exact numbers is perhaps not so realistic, but it's an example.
Since you have 10M of "live" data, you will have to have at least 10M heap. Other than that, you need to check how your garbage collector works. Simplified, the time a GC takes to complete is O(live set), that is, the amount of "dead" data does not really enter into it. With a constant live set size, your GC time is constant no matter your heap size. This leads to larger heap -> Better throughput.
(Now, to really mess things up you add stuff like compaction of the heap and the image becomes even less clear...)
Conclusion
It's a simplified version of the matter, but the short answer is - It depends.
This probably depends on the rate that you allocate new objects. Garbage collection involves a lot of work tracing references from live objects. I have just been dealing with a situation where there was plenty of free memory (say 500 MB used, 500 MB free) but so much array allocation was happening that the JVM would spend 95% of its time doing GC. So don't forget about the runtime memory behaviour.
All those performance tuning articles that say something like "object allocation is really fast in Java" without mentioning that some allocations cause 1 second of GC time make me laugh.
Related
I am looking into the performance overheads with the bincode::deserialize (and friends). The deserialized data in question is a large BTreeMap<String -> Vec<u8>>. The sizes involved are fairly big (~2MB ), and a is very frequent operation. Profiling indicates a good chunk of time going into heap operations.
I would like to reuse the previously deserialized area for the next operation (I am hoping this would significantly cut the heap overhead, but don't have supporting data yet). deserialize_in_place appears to be a candidate, but doesn't look so straight forward to use.
Looking for any ideas or alternatives.
Thanks.
I'm writing an application where performance is fairly critical. I'm a bit confused as to which is the most efficient data type for x64 CPUs.
MDSN says that "In some cases, the common language runtime can pack your Short variables closely together and save memory consumption." but also that "The Integer data type provides optimal performance on a 32-bit processor"
I'm using a huge amount of data (average around 5 million values in a jagged array[10 or more][30][128,128]) to generate bitmaps in real time (heat maps of the data values). All of the data points are whole numbers between 200 and 3500 so I can use short or integer. Which would be most efficient?
Thanks.
The Int32 type is most efficient for regular variables, for example loop counters, both in 32 bit and 64 bit applications.
When you handle large arrays of data the efficiency of reading/writing a single value doesn't matter much, what matters is to access the data so that you get as few memory cache misses as possible. A memory cache miss is very expensive compared to an access to cached memory. (Also, a page fault (memory swapped to disk) is very expensive compared to a memory cache miss.)
To avoid cache misses you can store the data as compact as possible, and when you process the data you can access it as linearly as possible so that the memory area that you access is as small as possible.
Using Int16 is most likely to be more efficient than Int32 for any array large enough to span multiple cache blocks, and a cache block is generally just a few kilobytes.
As your values are possible to store in just 12 bits, it might even be more efficient to store each value in 1.5 bytes eventhough that means more processing to handle the data. The reduction of 25% of the data size might more than make up for the extra processing.
As a general rule, the less memory a variable uses, the faster it will be processed and you will have better memory management because your application will use less amount of it.
Short only needs the half of memory integer needs, if you only need a 16 bits number and you are sure it will never be bigger, use Short.
I have a slight problem where when the user plays my game for more than 20 minutes or so it begins to slow quite considerably. However I have been trying to work through the issues pretty hard as of late but still no luck. I have tried the leaks instrument and I now have that squared away, but I read at "bbum's weblog" about using the Allocations Instrument and taking heap shots. But i dont quite understand what i am looking at, could some one give me a hand with this?
My game involves users selecting words. I took a heap shot after each word was selected, but i am not too sure how exactly to read this. Is the heap Growth column what is currently running or is it was has been added to what is currenlly running?
And what is the # Persistent?
Also why is the # Persistent jump so much? Could that be my memory problem?
Thanks for all the help!
The heap growth column represents all of the allocations in that iteration that did not exist prior to that iteration but continue to exist in all subsequent iterations.
I.e. Heapshot 4 shows a 10.27KB permanent growth in the size of your heap.
If you were to take an additional Heapshot and any of the objects in any of the previous iterations were deallocated for whatever reason, the corresponding iteration's heapshot would decrease in size.
In this case, the heapshot data is not going to be terribly useful. Sure; you can dive in an d look at the various objects sticking around, but you don't have a consistent pattern across each iteration.
I wrote considerably more about this in a weblog post.
If it's slowing down, why not try CPU profiling instead? Unless you're
getting memory warnings, what makes you think it's a leak?
Tim's comment is correct in that you should be focusing on CPU usage. However, it is quite effective to assume that an app is slowing down because of increased algorithmic cost associated with a growing working set. I.e. if there are more objects in memory, and those objects are still in use, then it takes more time to muck with 'em.
That isn't the case here; your heap isn't growing that significantly and, thus, it sounds like you have a pure algorithmic issue if your app is truly slowing down.
Does your game save files to NSUserDefaults or to an any arrays. If so as the game is played and more and more stuff is added to the array it would take longer to loop through it hence gradually slowing down the game.
I'm thinking of optimizing a program via taking a linear array and writing each element to a arbitrary location (random-like from the perspective of the CPU) in another array. I am only doing simple writes and not reading the elements back.
I understand that a scatted read for a classical CPU can be quite slow as each access will cause a cache miss and thus a processor wait. But I was thinking that a scattered write could technically be fast because the processor isn't waiting for a result, thus it may not have to wait for the transaction to complete.
I am unfortunately unfamiliar with all the details of the classical CPU memory architecture and thus there may be some complications that may cause this also to be quite slow.
Has anyone tried this?
(I should say that I am trying to invert a problem I have. I currently have an linear array from which I am read arbitrary values -- a scattered read -- and it is incredibly slow because of all the cache misses. My thoughts are that I can invert this operation into a scattered write for a significant speed benefit.)
In general you pay a high penalty for scattered writes to addresses which are not already in cache, since you have to load and store an entire cache line for each write, hence FSB and DRAM bandwidth requirements will be much higher than for sequential writes. And of course you'll incur a cache miss on every write (a couple of hundred cycles typically on modern CPUs), and there will be no help from any automatic prefetch mechanism.
I must admit, this sounds kind of hardcore. But I take the risk and answer anyway.
Is it possible to divide the input array into pages, and read/scan each page multiple times. Every pass through the page, you only process (or output) the data that belongs in a limited amount of pages. This way you only get cache-misses at the start of each input page loop.
When examining a process in Process Explorer, what does it mean when there are several page faults? The application is processing quite a bit of data and the UI is not very responsive. Are there optimizations to the code that could reduce or eliminate page faults? Would increasing the physical RAM of the system make a difference?
http://en.wikipedia.org/wiki/Page_fault
Increasing the physical RAM on your machine could result in fewer page faults, although design changes to your application will do much better than adding RAM. In general, having a smaller memory footprint, and having things that will often be accessed around the same time be on the same page will decrease the number of page faults. It can, also, be helpful to try to do everything you can with some bit of data in memory all at once so that you don't need to access it many different times, which may cause page faults (aka thrashing).
It might also be helpful to make sure that memory that is accessed after each other is near to each other (eg if you have some objects, place them in an array) if these objects have lots of data that is very infrequently used, place it in another class and make the first class have a reference to the second one. This way you will use less memory most of the time.
A design option would be to write a memory cache system, lazy creating memory (create on demand). such cache would have a collection of pre-allocated memory chunks, accessed by their size. For example, an array of N lists, each list having M buffers.each list is responsible to bring you memory in a certain size range. (for example, from each list bringing you memory in the range of 2^i (i = 0..N-1). even if you want to use less then 2^i, you just dont use the extra memory in the buffer.
this would be a tradeoff of small memory waste, vs caching and less page faults.
another option is to use nedmalloc
good luck
Lior