Trying to work out some leaks for iPad game - objective-c

I have a slight problem where when the user plays my game for more than 20 minutes or so it begins to slow quite considerably. However I have been trying to work through the issues pretty hard as of late but still no luck. I have tried the leaks instrument and I now have that squared away, but I read at "bbum's weblog" about using the Allocations Instrument and taking heap shots. But i dont quite understand what i am looking at, could some one give me a hand with this?
My game involves users selecting words. I took a heap shot after each word was selected, but i am not too sure how exactly to read this. Is the heap Growth column what is currently running or is it was has been added to what is currenlly running?
And what is the # Persistent?
Also why is the # Persistent jump so much? Could that be my memory problem?
Thanks for all the help!

The heap growth column represents all of the allocations in that iteration that did not exist prior to that iteration but continue to exist in all subsequent iterations.
I.e. Heapshot 4 shows a 10.27KB permanent growth in the size of your heap.
If you were to take an additional Heapshot and any of the objects in any of the previous iterations were deallocated for whatever reason, the corresponding iteration's heapshot would decrease in size.
In this case, the heapshot data is not going to be terribly useful. Sure; you can dive in an d look at the various objects sticking around, but you don't have a consistent pattern across each iteration.
I wrote considerably more about this in a weblog post.
If it's slowing down, why not try CPU profiling instead? Unless you're
getting memory warnings, what makes you think it's a leak?
Tim's comment is correct in that you should be focusing on CPU usage. However, it is quite effective to assume that an app is slowing down because of increased algorithmic cost associated with a growing working set. I.e. if there are more objects in memory, and those objects are still in use, then it takes more time to muck with 'em.
That isn't the case here; your heap isn't growing that significantly and, thus, it sounds like you have a pure algorithmic issue if your app is truly slowing down.

Does your game save files to NSUserDefaults or to an any arrays. If so as the game is played and more and more stuff is added to the array it would take longer to loop through it hence gradually slowing down the game.

Related

Does optimizing code in TI-BASIC actually make a difference?

I know in TI-BASIC, the convention is to optimize obsessively and to save as many bits as possible (which is pretty fun, I admit).
For example,
DelVar Z
Prompt X
If X=0
Then
Disp "X is zero"
End //28 bytes
would be cleaned up as
DelVar ZPrompt X
If not(X
"X is zero //20 bytes
But does optimizing code this way actually make a difference? Does it noticeably run faster or save memory?
Yes. Optimizing your TI-Basic code makes a difference, and that difference is much larger than you would find for most programming languages.
In my opinion, the most important optimization to TI-Basic programs is size (making them as small as possible). This is important to me since I have dozens of programs on my calculator, which only has 24 kB of user-accessible RAM. In this case, it isn't really necessary to spend lots of time trying to save a few bytes of space; instead, I simply advise learning the shortest and most efficient ways to do things, so that when you write programs, they will naturally tend to be small.
Additionally, TI-Basic programs should be optimized for speed. Examples off of the top of my head include the quirk with the unclosed For( loop, calculating a value once instead of calculating it in every iteration of a loop (if possible), and using quickly-accessed variables such as Ans and the finance variables whenever the variable must be accessed a large number of times (e.g. 1000+).
A third possible optimization is for run-time memory usage. Every loop, function call, etc. has an overhead that must be stored in the memory stack in order to return to the original location, calculate values, etc. during the program's execution. It is important to avoid memory leaks (such as breaking out of a loop with Goto).
It is up to you to decide how you balance these optimizations. I prefer to:
First and foremost, guarantee that there are no memory leaks or incorrectly nested loops in my program.
Take advantage of any size optimizations that have little or no impact on the program's speed.
Consider speed optimizations, and decide if the added speed is worth the increase in program size.
TI-BASIC is an interpreted language, which usually means there is a huge overhead on every single operation.
The way an interpreted language works is that instead of actually compiling the program into code that runs on the CPU directly, each operation is a function call to the interpreter that look at what needs to be done and then calls functions to complete those sub tasks. In most cases, the overhead is a factor or two in speed, and often also in stack memory usage. However, the memory for non-stack is usually the same.
In your above example you are doing the exact same number of operations, which should mean that they run exactly as fast. What you should optimize are things like i = i + 1, which is 4 operations into i++ which is 2 operations. (as an example, TI-BASIC doesn't support ++ operator).
This does not mean that all operations take the exact same time, internally a operation may be calling hundreds of other functions or it may be as simple as updating a single variable. The programmers of the interpreter may also have implemented various peephole optimizations that optimizes very specific language constructs, e.g. for(int i = 0; i < count; i++) could both be implemented as a collection of expensive interpreter functions that behave as if i is generic, or it could be optimized to a compiled loop where it just had to update the variable i and reevaluate the count.
Now, not all interpreted languages are doomed to this pale existence. For example, JavaScript used to be one, but these days all major js engines JIT compile the code to run directly on the CPU.
UPDATE: Clarified that not all operations are created equal.
Absolutely, it makes a difference. I wrote a full-scale color RPG for the TI-84+CSE, and let me tell you, without optimizing any of my code, the game would flat out not run. At present, on the CSE, Sorcery of Uvutu can only run if every other program is archived and all other memory is out of RAM. The programs and data storage alone takes up 20k bytes in RAM, or just 1kb under all of available user memory. With all the variables in use, the memory approaches dangerously low points. I had points in my development where due to poor optimizations, I couldn't even start the game without getting a "memory all gone" error. I had plans to implement various extra things, but due to space and speed concerns, it was impossible to do so. That's only the consideration to space.
In the speed department, the game became, and still is, slow in the overworld. Walking around in the overworld is painfully slow compared to other games, and that's because of what I have to do in that code; I have to check for collisions, check if the user is moving to a new map, check if they pressed a key that should illicit a response, check if a battle should go on, and more. I was able to make slight optimizations to the walking speed, but even then, I could blatantly tell I had made improvements. It still was pretty awfully slow (at least compared to every other port I've made), but I made it a little more tolerable.
In summary, through my own experiences crafting a large project, I can say that in TI-Basic, optimizing code does make a difference. Other answers mentioned this, but TI-Basic is an interpreted language. This means the code isn't compiled into faster, lower level code, but the stuff that you put in the program is read straight out as it executes, is interpreted by the interpreter, calls the subroutines and other stuff it needs to to execute the commands, and then returns back to read the next line. As a result of that, and the fact that the TI-84+ series CPU, the Zilog Z80, was designed in 1976, you get a rather slow interpreter, especially for this day and age. As such, the fewer the commands you run, and the more you take advantage of system weirdness such as Ans being the fastest variable that can also hold the most types of data (integers/floats, strings, lists, matrices, etc), the better the performance you're gonna get.
Sources: My own experiences, documented here: https://codewalr.us/index.php?topic=778.msg27190#msg27190
TI-84+CSE RAM numbers came from here: https://education.ti.com/en/products/calculators/graphing-calculators/ti-84-plus-c-se?category=specifications
Information about the Z80 came from here: http://segaretro.org/Zilog_Z80
Depends, if it's just a basic math program then no. For big games then YES. The TI-84 has only 3.5MB of space available and has the combo of an ancient Z80 processor and a whopping 128KB of RAM. TI-BASIC is also quite slow as it's interpreted (look it up for further information) so if you to make fast-running games then YES. Optimization is very important.

Strange thing in memory management for ios development

I have a app in my ipod.
1.Open the app, and look at the memory Instruments (Activity monitor), it's 8.95M
2.click a button, it will add a UIImageView with a large image to the screen, the memory is 17.8M now.
3.Remove the UIImageView from screen, and wait a second, the memory is 9.09M now.
I am sure the UIImageView is released after it removed from screen. That's very simple code.
So when it removed, the status of the app should as the same as before add the UIImageView add to the screen, am I right? But why the memory is 9.09M rather than 8.95M? If you add a more complex View to the screen, the difference is more obvious.
This is normal. It's due to a "lazy grow, lazy shrink" algorithm. What that means is that you have a data structure that can be sized for small numbers of items or large numbers of items. The sizing for small numbers of items uses very little memory but isn't efficient when handling large numbers of items. The sizing for large numbers is very efficient for managing large collections of things, but uses more memory to index the objects.
A "lazy grow, lazy shrink" algorithm tries to avoid the cost of resizing a structure's index by only growing the index if it's much too small and only shrinking it if it's much too big. For example, a typical algorithm might grow the index only if its ideal size is at least three times bigger than it is and shrink it only if it's more than three times its ideal size. This is also needed to prevent large numbers of resize operations if an application rapidly allocates and frees collections of resources -- you want the index size to be a bit 'sticky'.
When you open the large object and consume GUI objects, you make the index much too small, and it grows. But when you close the large object, you make the index only a bit too big, so it doesn't shrink.
If the device comes under memory pressure, the index will shrink. If the application continues to reduce its use of UI resources, the index will shrink. If the application uses more UI resources, the index will not need to grow again quite as soon.
A good analogy might be stacks of paper on your desk. If you have 30 papers you might need to find, you might keep them in 4 stacks. But if you have 5,000 papers, 4 stacks will make searching tedious. You'll need more stacks in that case. So when the number of papers gets too big for 4 stacks, you need to re-index into a greater number of stacks. But then when the number gets small, you won't bother to constantly re-index until you have way too many stacks, because searching is still pretty fast.
When you're done handling all those papers, your desk has a few extra stacks. That saves it from re-indexing the next time it needs to handle a lot of papers.

Scattered-write speed versus scattered-read speed on modern Intel or AMD CPUs?

I'm thinking of optimizing a program via taking a linear array and writing each element to a arbitrary location (random-like from the perspective of the CPU) in another array. I am only doing simple writes and not reading the elements back.
I understand that a scatted read for a classical CPU can be quite slow as each access will cause a cache miss and thus a processor wait. But I was thinking that a scattered write could technically be fast because the processor isn't waiting for a result, thus it may not have to wait for the transaction to complete.
I am unfortunately unfamiliar with all the details of the classical CPU memory architecture and thus there may be some complications that may cause this also to be quite slow.
Has anyone tried this?
(I should say that I am trying to invert a problem I have. I currently have an linear array from which I am read arbitrary values -- a scattered read -- and it is incredibly slow because of all the cache misses. My thoughts are that I can invert this operation into a scattered write for a significant speed benefit.)
In general you pay a high penalty for scattered writes to addresses which are not already in cache, since you have to load and store an entire cache line for each write, hence FSB and DRAM bandwidth requirements will be much higher than for sequential writes. And of course you'll incur a cache miss on every write (a couple of hundred cycles typically on modern CPUs), and there will be no help from any automatic prefetch mechanism.
I must admit, this sounds kind of hardcore. But I take the risk and answer anyway.
Is it possible to divide the input array into pages, and read/scan each page multiple times. Every pass through the page, you only process (or output) the data that belongs in a limited amount of pages. This way you only get cache-misses at the start of each input page loop.

What causes page fault and how to minimize them?

When examining a process in Process Explorer, what does it mean when there are several page faults? The application is processing quite a bit of data and the UI is not very responsive. Are there optimizations to the code that could reduce or eliminate page faults? Would increasing the physical RAM of the system make a difference?
http://en.wikipedia.org/wiki/Page_fault
Increasing the physical RAM on your machine could result in fewer page faults, although design changes to your application will do much better than adding RAM. In general, having a smaller memory footprint, and having things that will often be accessed around the same time be on the same page will decrease the number of page faults. It can, also, be helpful to try to do everything you can with some bit of data in memory all at once so that you don't need to access it many different times, which may cause page faults (aka thrashing).
It might also be helpful to make sure that memory that is accessed after each other is near to each other (eg if you have some objects, place them in an array) if these objects have lots of data that is very infrequently used, place it in another class and make the first class have a reference to the second one. This way you will use less memory most of the time.
A design option would be to write a memory cache system, lazy creating memory (create on demand). such cache would have a collection of pre-allocated memory chunks, accessed by their size. For example, an array of N lists, each list having M buffers.each list is responsible to bring you memory in a certain size range. (for example, from each list bringing you memory in the range of 2^i (i = 0..N-1). even if you want to use less then 2^i, you just dont use the extra memory in the buffer.
this would be a tradeoff of small memory waste, vs caching and less page faults.
another option is to use nedmalloc
good luck
Lior

What is the optimal freeheap to totalheap ratio?

What is the optimal freeheap to totalheap ratio? At what values of this ratio should I consider increasing the heap size/ decreasing the heap size?
The ideal momentary ratio is 1. Ideally, your JVM would consume exactly the memory it required, no more and no less. That's a very hard target to reach ;)
The problem (as TNilsson points out) is that your application's memory requirements change over time as it does work, so you want it to have enough space not to cause constant collection/compaction more often than you can tolerate, and you want it to consume little enough space that you don't have to buy more RAM.
There is no single easy answer, let me give you two examples:
Example 1 - Your program allocates 100M worth of memory at startup, and then does not allocate any memory what so ever for the rest of its run.
In this case, you clearly want to have a heap size of 100M (Well, perhaps 101 or something, but you get the point...) to avoid wasting space.
Example 2 - Your program allocates 10M of memory per second. None of the data is persisted longer than 1 second. (e.g. you are doing a calculation that requires a lot of temporary data, and will return a single integer when you are done...)
Knowing the exact numbers is perhaps not so realistic, but it's an example.
Since you have 10M of "live" data, you will have to have at least 10M heap. Other than that, you need to check how your garbage collector works. Simplified, the time a GC takes to complete is O(live set), that is, the amount of "dead" data does not really enter into it. With a constant live set size, your GC time is constant no matter your heap size. This leads to larger heap -> Better throughput.
(Now, to really mess things up you add stuff like compaction of the heap and the image becomes even less clear...)
Conclusion
It's a simplified version of the matter, but the short answer is - It depends.
This probably depends on the rate that you allocate new objects. Garbage collection involves a lot of work tracing references from live objects. I have just been dealing with a situation where there was plenty of free memory (say 500 MB used, 500 MB free) but so much array allocation was happening that the JVM would spend 95% of its time doing GC. So don't forget about the runtime memory behaviour.
All those performance tuning articles that say something like "object allocation is really fast in Java" without mentioning that some allocations cause 1 second of GC time make me laugh.