Why do Cocoa apps use so much memory?

Why do Cocoa apps use so much memory? - objective-c

Even the standard blank-window Cocoa app that gets built when you make a new Cocoa project in Xcode uses almost 6 MB of memory. What's the reason for this? Is it possible to make an app use less, or does OS X simply manage memory differently for Cocoa apps?
Not that I'm complaining. I know that performance "hardly matters anymore" (edit: what I mean is, it matters less than readability/maintainability/the programmer's time). I'm just curious.

OS X does a lot of magic with shared memory and copy-on-write pages, so chances are that it doesn't take that much physical RAM for every application.
You can check exactly how memory blocks are mapped by running:
sudo vmmap <PID of the process>

Depends on the all the framework (APIs) you use. Combine that with the VM allocations done by low level ops.
It's only worth trying to reduce the heap alloc (total), as well as the resident size of the code. Making sure your data structs are allocated efficiently and trying to compile with the ever-so-famous "-Os" optimization flag (size optimization). There isn't much you can do about the VM eaten by Cocoa. I wouldn't really worry about it.

This is clearly a 'WTF' moment for developers in general. The question is usually - why does my trivial application use up so much memory.
The answer is down to the underlying framework. You could argue that 6MB is too much, but really, it is nothing.
It's not rare to see computers come with 2GB of memory these days. The stock IMAC is 4GB. The whole point of the computer industry is to use up all the resources a machine has so that it continues to evolve.
Yes you should avoid ineffecincies where possible (Don't load up a 5million point array at start up for instance). But unless your beta demonstrates you fudged up just keep it on the list of todo's.

I'm a bit out on a limb here, but I guess it's because all the libraries that get added have to do quite a bit of setting up and there is no need to garbage collect, so they simply get to waste memory; plus, even if all memory got autoreleased, it would wait until the first idle event, which is after the creation of the window. Delete unneeded libraries/frameworks, or force a garbage collect somewhere after loading the window from the nib and see how much it goes down if you're so concerned.
I am not concerned about it. Some of the memory might be returned later, and the rest is the price you pay for a powerful framework.

A factor which is not directly linked to cocoa but is valid to frameworks in general is that the overhead is not linear. There is usually a fixed and a variable "price", in terms of overhead, to use the framework.
When you create a simple blank window, the fixed overhead is crushing, but when you create an application with tens of windows, dialogs, controls and all, the initial fixed overhead becomes negligible, against the size of the application itself.

Related

is it recommended to use SPI flash to run code instead internal flash due to memory limitation of internal flash?

We used the LPC546xx family microcontroller in our project, currently, at the initial stage, we are finalizing the software and hardware requirements. The basic firmware size (which contains RTOS, 3rd party stack, library, etc...) currently is 480 KB. Now once full application developed than the size will exceed the internal flash size (512KB) and plus we needed storage which can hold firmware update image separately.
So we planned to use SPI flash (S25LP064A-JBLE, http://www.issi.com/WW/pdf/IS25LP032-064-128.pdf, serial flash memory) of 4MB\8MB to boot and run firmware.
is it recommended to run code from SPI flash? how can I map external flash memory directly to CPU memory space? Can anyone give an example that contains this memory mapping(linker script etc..) or demo application in which LPC546xx uses SPI FLASH?

Generally speaking it's not recommended, or differently put: the closer to the CPU the better. Both S25LP064A and LPC546xx however support XIP, so it is viable.
This is not a trivial issue as many aspects are affecting. I.e. issue is best avoided and should really have been ironed out in the planning stage. Embedded Systems are more about compromising than anything and making the right/better choices takes skill end experience.
Same question with replies on the NXP forum: link
512K of NVRAM is huge. There are almost certainly room for optimisations even if 3'rd party libraries are used.
On a related note this discussion concerning XIP should give valuable insight: link.
I would strongly encourage use of file-systems if not done already, for which external storage is much better suited. The further from the computational unit, the more relevant. That's not XIP and the penalty is copy-to-RAM either way you do it. I.e. performance will be slower. But in my experience, the need for speed has often-times not been thoroughly considered and at least partially greatly overestimated.
Regarding your mentioning of RTOS and FW-upgrade:
Unless it's a poor RTOS there's file-system awareness built in. Especially for FW upgrading (Note: you'll need room for 3 images, factory reset included), unless already supported by the SoC-vendor by some other means (OTA), it will make life much easier and less risky. If there's no FS-awareness, it can be added.
FW upgrade requires a lot of extra storage. More if simpler. Simpler is however also safer which especially for FW upgrades matters hugely. In the simplest case (binary flat image), you'll need at least twice the amount of memory you're already consuming.
All-in-all: I think the direction you're going is viable and depending on the actual situation perhaps your only choice.

Heap profiling on ARM

I am developing a GUI-heavy C++ application on a Freescale MX51-based board Linux 2.6.35. I would like to perform heap profiling.
Unfortunately, all heap profiling tools I have found have either been too intrusive or ostensibly non-working on ARM. Specific tools I've tried:
Valgrind Massif: unworkable on my platform due to the platform's feeble CPU. The 80% CPU time overhead introduced by Massif causes a range of problems in my application that cannot be compensated for.
gperftools (formerly Google Performance Tools) tcmalloc: All features of this rather un-intrusive, library-based libc malloc() replacement work on my target except for the heap profiler. To rephrase, the thread caching allocator works but the profiler does not. I'll explain the failure mode of the profiler below for anyone curious.
Can anyone suggest a set of replacement tools for performing C++ heap profiling on ARM platforms? Ideal output would ultimately be a directed allocation graph, similar to what gperftools' tcmalloc outputs. Low resource utilization is a must- my platform is highly resource constrained.
Failure mode of gperftools' tcmalloc explained:
I'm providing this information only for those that are curious; I do not expect a response. I'm seeing something similar to gperftools' issue #407 below, except on ARM rather than x86.
Specifically, I always get the message "Hooked allocator frame not found, returning empty trace." I spent some time debugging the issue and it appears that, when dynamically linking the tcmalloc library, frame pointers at the boundary between my application and the dynamic library are null- the stack cannot be walked "above" the call into the dynamic library.
gperftools issue #407: https://github.com/gperftools/gperftools/issues/410
stackoverflow user seeing similar problems on ARM: Missing frames on shared libraries on ARM

Heaps. Many ways to do them, but I've only run across 3 main types that matter in embedded land:
Linked list heaps. Each alloc is tracked in a "used" list. Once freed, they are dropped into a "free" list. On freeing, adjacent blocks of free memory are "joined" into larger pieces. Allocs can be any size. Each alloc and free is a O(N) op as it has to traverse the free list to give you a piece of memory plus break the free block into a size close to what you asked for while leaving the remaining block in the free list. Because of the increasing overhead per alloc, this system cannot be used by itself on smaller systems. This also tends to cause memory fragmentation over time if steps aren't taken to minimize it.
Fixed size (unit) heaps. You break your heap into equal size (smaller) parts. This wastes memory a bit, depending on how big the chunks are (and how many different sized, fixed allocator heaps you create), but alloc and free are both O(1) time operations. No searching, no joining. This style is often combined with the first one for "small object allocations" as the engines I've worked with have 95% of their allocations below a set size (say 256 bytes). This way, you use the unit heap for small allocs for huge speed and only minimal memory loss, while using the list heap for larger allocs. No external fragmentation of memory either.
Relocatable memory heaps. You don't give out pointers to memory, but handles. That way, behind the scenes, you can change memory pointers when needed to remove fragmentation or whatever. High overhead. High pain the the #$$ quotient as it's easy to abuse and get dangling pointer all over. Also added overhead for each memory dereference. But wanted to mention it.
There's some basic patterns. You can find all sorts of libs out in the wild that use them and also have built in statistics for number of allocs, fragmentation, and other useful stats. It's also not the hard to roll your own really, though I'd not recommend it for anything outside of satisfying curiosity as debugging without a working malloc is painful indeed. Adding thread support is pretty straightforward as well, but again, downloading a ready made solution is the better choice.
The above info applies to all platforms, ARM or otherwise, though most of my experience has been on low level ARM stuff so the above info is battle tested for your platform. Hope this helps!

retrieve video ram usage iphone

i have see this article about retrieve memory usage off iphone app
programmatically-retrieve-memory-usage-on-iphone It's great !
In my project i want to retrieve the available VRAM free, because my app load many textures, and i must preload theses into the video Ram for fast rendering.
but on the VM_statistics i don't view theses properties : vm_statistics MAN page
Thanks a lot for your help.

As you've seen so far, getting hard numbers for GL texture memory usage is quite difficult. It's complicated further by the fact that CoreAnimation will also use GL Texture memory without "consulting" you, including from processes other than yours.
Practically speaking, I suggest that you use the VM Tracker instrument in Instruments to watch changes in the VM pages your process maps under the IOKit tag. It's a bit crude, but it's the best approach I've found. In my experience, this process is largely guess and check.
You asked specifically for a way to determine the amount of free VRAM, but even if you could get that info, it's not really likely to be helpful. Even if your app is totally OpenGL and uses no UIViews or CoreAnimation layers other processes, most importantly those more privileged than yours, can consume that memory at any time, either explicitly or implicitly through CoreAnimation. It's also probably safe to assume that if your app prevents those more-privileged apps from getting the texture memory they need, your process will be killed.
Put differently, even if you could ascertain the instantaneous state of the GL texture memory, you probably couldn't count on being the only consumer of that resource, so it's pretty useless.
At the end of the day, you should spend your effort designing your app to be a good citizen in terms of GL memory and manage (read: minimize) your own consumption of texture memory. iOS devices are not old-school game consoles -- you are not the only thing running -- so you need to be mindful and tolerant of that fact, lest your app be one of those where everyone has to reboot their phone every few minutes in order to use it.

boost::serialization high memory consumption during serialization

just as the topic suggests I've come across a slight issue with boost::serialization when serializing a huge amount of data to a file. The problem consists of the memory footprint of the serialization part of the application taking around 3 to 3.5 times the memory of my objects being serialized.
It is important to note that the data structure I have is a three dimensional vector of base class pointers and a pointer to that structure. Like this:
using namespace std;
vector<vector<vector<MyBase*> > >* data;
This is later serialised with a code analog to this one:
ar & BOOST_SERIALIZATION_NVP(data);
boost/serialization/vector.hpp is included.
Classes being serialised all inherit from "MyBase".
Now, since the start of my project I've used different archives for serialization from typical binary_archive, text, xml and finally polymorphic binary/xml/text. Every single one of these acts exactly the same way.
Typically this wouldn't be a problem if I had to serialize small amounts of data but the number of classes I have are in the milions (ideally around 10 milion) and the memory usage as I've been able to test it shows consistently that the memory allocated by boost::serialization part of the code is around 2/3 of the application whole memory footprint while writing the file.
This amounts to around 13.5 GB of RAM taken for 4 milion objects where the objects themselves take 4.2GB. Now this is as far as I've been able to take my code since I don't have access to a machine with more than 8GB of physical RAM. I should also note that this is a 64bit application being run on a Windows 7 professional x64 edition but the situation is similar on an Ubuntu box.
Anyone has any idea how I would go about troubleshooting this as it is unacceptable for me to have such high memory requirements for an application that will not use as much memory while running as it does while serializing.
Deserialization isn't as bad, as it allocates around 1.5 times the needed memory. This is something I could live with.
Tried turning tracking off with boost::archive::archive_flags::no_tracking but it acts exactly the same.
Anyone have any idea what I should do?

Using valgrind I found that the main reason of memory consumption is a map inside the library to track pointers. If you are certain that you do not need pointer tracking ( it means you are sure that there is no pointer aliasing) disable tracking. You can find here the main concepts of disable tracking. In short you must do something like this:
BOOST_CLASS_TRACKING(vector<vector<vector<MyBase*> > >, boost::serialization::track_never)
In my question I wrote a version of this macro that you could disable tracking of a template class. This must have a significant impact on your memory consumption.
Also notice that there are pointers inside any containers If you want tracking never you must disable tracking of them too. Currently I could not find any way to do this properly.

Points to be considered while designing or coding for lesser footprint deliverables

Please post the points one should keep in mind while designing or coding for lesser footprint deliverables for embedded systems.
I am not giving compiler or platform details, as I want generic information. But, any specific information on Linux based OS is also welcome.

Depends on how low you want to get. I'm currently coding for fiscal printers, and there's no OS, and the main rule is no dynamic memory allocation. The funny thing is that I still convinced the crew to code fully modern C++ ;).
Actually there are a few rules we decided upon:
no dynamic allocation
hence, no STL
no exception handling (obvious reasons)

There isn't a general answer, only ones specific to language/platform ... but
Small memory footprint ...
Don't use Java, C#/mono, PHP, Perl, Python or anything with garbage collection
Get as close to the metal as feasible, Use C
Do alot of profiling to see where memory is getting allocated, if you are using dynamic allocation
Ensure you prevent heap-fragmentation by allocating sensible chunks and sizes of the heap
Avoid recursive functions especially those that use malloc(). Better allocating a chunk and passing a pointer around.
use free() ;)
Ensure your types are no bigger than required
Turn on compiler optimizations
There will be more.

for real low footprint consider doing Assembly directly.
We all know that Hello World in C or C++ is 20kb+(because of all the default libraries which get linked). In Assembly this overhead is gone. As pointed out in the comments one can reduce the standard libraries quite a bit. However, the fact remains that the code density you can get when coding assembly is much higher than a compiler will generate from a higher language. So for code where every byte matters, use assembly.
also when programming on devices with less capable processors, programming in assembly language might be your only way to do make the program fast enough for it to be realtime enough to (for instance) control machines

When faced with such constraints, it is advisable to pre-allocate memory in order to guarantee that the system will work under load. A design pattern such as "object pooling" can be used to share resources within the system.
The C language enables tight resource (i.e. memory & compute cycles) control. It should be strongly considered.
Avoid recursion as it is easy to abuse and can result in stack overflow conditions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas