Can static analysis detect memory leaks?

Can static analysis detect memory leaks? - testing

Having received my ISTQB certification a long time ago, I remember that it makes the following distinction:
-static analysis: performed on the source code, detects unreachable code, unassigned values etc.
-dynamic analysis: can detect memory leaks etc., requires execution (profiling).
But when I search today, I can see various sites and sources mentioning static analysis is capable of detecting memory leaks too.
So I wonder, is static analysis really capable of that? And if so, what is the different between dynamic analysis then, in terms of results?

Being one of the developers of a static analyzer, I can state that the problem of searching for memory leaks is an extremely complicated and sometimes impossible task for SCA. Static analyzers are really weak in this sphere, and we shouldn't expect much from them. Dynamic analyzers are much stronger regarding the search of memory leaks and if there is a task to find them, then you should consider dynamic, not static analysis.
Yes, static analyzers are able to find simple cases of memory leaks. But in practice you have the memory leaks mostly when the code is complicated and the memory is free/allocated in different parts of the program. Therefore, static analysis is really not very efficient.

A well designed/implemented static analysis tool can detect many cases where some code must have a leak, merely by analyzing the code. Tools like Coverity/Prevent do this pretty well.
Such tools can also detect many cases where there might be a leak (and the Turing tarpit prevents it from knowing for sure). There is a huge argument about whether the tool should report these, because they might be false positives, and false positives are a waste of programmer time. [Worse: if a programmer wastes her time on several false positives, s/he often quits using the tool altogether, and now even the value of truly detected bugs is lost].
Dynamic analysis tools can usually tell if a leak happens, at the moment it happens at runtime. (Imagine a pointer to heap being held in a local variable, and that local variable going out of scope). (See our CheckPointer tool for a dynamic analysis tool that can detect virtually every stack/heap allocation/pointer misuse error encountered at runtime).

I'm pretty sure a static analyzer could catch:
void MyFunction()
{
char * leakable = new char[1000];
}
so, the answer to your question is self evidently "Yes".
A more interesting question is can it catch more subtle leaks. And the answer there is "often yes" if it has access to all the source code involved or a representation of the contract for the methods involved (i.e.: If the comment says: The caller is responsible for releasing the returned object", then the static analyzer may not catch, it, but if that same concept is expressed in code (or can be arrived at by analyzing the code), a static analyzer can find the problem -- sometimes.

Static analysis is capable of detecting the potential for a memory leak, in the form of a construct that is can be anticipated to lead to memory leaks. It cannot, however, detect the presence of an actual memory leak at runtime since it never examines the execution of the codebase.

Related

Thread safety of primitive value type properties - Objective C

Question
I am working on a project where I am concerned about the thread safety of an object's properties. I know that when a property is an object such as an NSString, I can run into situations where multiple threads are reading and writing simultaneously. In this case you can get a corrupt read and the app will either crash or result in corrupted data.
My question is for primitive value type properties such as BOOLs or NSIntegers. I am wondering if I can get into a similar situation where I read a corrupt value when reading and writing from multiple threads (and the app will crash)? In either case, I am interested in why.
Clarification - 1/13/17
I am mostly interested in if a primitive value type property is differently susceptible to crashing due to multiple threads accessing it at the same time than an object such as NSMutableString, custom created object, etc. In addition, if there is a difference when accessing memory on the stack vs heap relative to multithreading.
Clarification - 12/1/17
Thank you to #Rob for pointing me to the answer here: stackoverflow.com/a/34386935/1271826! This answer has a great example that shows that depending on the type of architecture you are on (32-bit vs 64-bit), you can get an undefined result when using a primitive property.
Although this is a great step towards answering my question, I still wonder two things:
If there is a multithreading difference when accessing a primitive value property on the stack vs heap (as noted in my previous clarification)?
If you restrict a program to running on one architecture, can you still find yourself in an undefended state when access a primitive value property and why?
I should note that here has been a lot of conversation around atomic vs nonatomic in response to this question. Although this is generally an important concept, this question has little to do with preventing undefined multithreading behavior by using the atomic property modifier or any other thread safety approach such as using GCD.

If your primitive value type property is atomic, then you're assured it cannot be corrupted because your reading it from one thread while setting it from another (as long as you only use the accessor methods, and not interact with the backing ivar directly). That's the entire purpose of atomic. And, as you suggest, this only applicable to fundamental data types (or objects that are both immutable and stateless). But in these narrow cases, atomic can be useful.
Having said that, this is a far cry from concluding that the app is thread-safe. It only assures you that the access to that one property is thread-safe. But often thread-safety must be considered within a broader context. (I know you assure us that this is not the case here, but I qualify this for future readers who too quickly jump to the conclusion that atomic is sufficient to achieve thread-safety. It often is not.)
For example, if your NSInteger property is "how many items are in this cache object", then not only must that NSInteger have its access synchronized, but it must be also be synchronized in conjunction with all interactions with the cache object (e.g. the "add item to cache" and "remove item from cache" tasks, too). And, in these cases, since you'll synchronize all interaction with this broader object somehow (e.g. with GCD queue, locks, #synchronized directive, whatever), making the NSInteger property atomic then becomes redundant and therefore modestly less efficient.
Bottom line, in limited situations, atomic can provide thread-safety for fundamental data types, but frequently it is insufficient when viewed in a broader context.
You later say that you don't care about race conditions. For what it's worth, Apple argues that there is no such thing as a benign race. See WWDC 2016 video Thread Sanitizer and Static Analysis (about 14:40 into it).
Anyway, you suggest you are merely concerned whether the value can be corrupted or whether the app will crash:
I am wondering if I can get into a similar situation where I read a corrupt value when reading and writing from multiple threads (and the app will crash)?
The bottom line is that if you're reading from one thread while mutating on another, the behavior is simply undefined. It could vary. You are simply well advised to avoid this scenario.
In practice, it's a function of the target architecture. For example on 64-bit type (e.g. long long) on 32-bit x86 target, you can easily retrieve a corrupt value, where one half of the 64-bit value is set and the other is not. (See https://stackoverflow.com/a/34386935/1271826 for example.) This results in merely non-sensical, invalid numeric values when dealing with primitive types. For pointers to objects, this obviously would have catestrophic implications.
But even if you're in an environment where no problems are manifested, it's an incredibly fragile approach to eschew synchronization to achieve thread-safety. It could easily break when run on new, unanticipated hardware architectures or compiled under different configuration. I'd encourage you to watch that Thread Sanitizer and Static Analysis video for more information.

Design pattern for catching memory leaks in objective-c?

I have read Apple's memory management guide, and think I understand the practices that should be followed to ensure proper memory management in my application.
At present it looks like there are no memory leaks in my code. But as my code grows more complex, I wonder whether there is any particular pattern I should follow to keep track of allocations and deallocations of objects.
Does it make sense to create some kind of global object that is present throughout the execution of the application which contains a count of the number of active objects of a type? Each object could increment the count of their type in their init method, and decrement it in dealloc. The global object could verify at appropriate times if the count of a particular type is zero of not.
EDIT: I am aware of how to use the leaks too, as well as how to analyze the project using Xcode. The reason for this post is to keep track of cases which may not be detected through leaks or analyze easily.
EDIT: Also, it seems to make sense to have something like this so that leaks can be detected in builds early by running unit tests that check the global object. I guess that as an inexperienced objective-c programmer I would benefit from the views of others on this.

Each object could increment the count of their type in their init
method, and decrement it in dealloc.
To do that right, you'll have to do one of the following: 1) override behavior at some common point, such as NSObject's -init or , or 2) add the appropriate code to the designated initializer of every single class. Neither seems simple.
The global object could verify at appropriate times if the count of a
particular type is zero of not.
Sounds good, but can you elaborate a bit on "appropriate times"? How would you know at any given point in the life of your program which classes should have zero instances? You'd have a pretty good idea that there should be no objects at the end of the program, but Instruments could tell you the same thing in that case.
Objective-C has taken several steps to make memory management much simpler. Use properties and synthesized accessors where you can, as they essentially manage your objects for you. A more recent improvement is ARC, which goes even further toward automating most memory management tasks. You basically let the compiler figure out where to put the memory management calls -- it's like garbage collection without the garbage collector. Learn to use those tools well before you try to invent new ones.

Don't go that route... it's a pain in single inheritance. Most importantly, there are excellent tools at your disposal which you should master before thinking you must create some global counter. The global counter exists in a few tools already -- Learn them!
The way you combat it is to learn how to balance and manage everything correctly when it's written. It's really very simple in hindsight.
ARC is another option -- really that just postpones your understanding.
The first "design pattern" I recommend it to use release instead of autorelease where possible (although generally more useful for over-releases).
Next, run the leaks instrument/util regularly and fix all leaks/zombies immediately.
Third, learn the existing tools as you go! These tools can do really crazy stuff, like record the backtrace of every allocation and every reference count. You can pause your program's execution and view what allocations exist, alloc counts, backtraces, and all sorts of other stats.

How does the new automatic reference counting mechanism work?

Can someone briefly explain to me how ARC works? I know it's different from Garbage Collection, but I was just wondering exactly how it worked.
Also, if ARC does what GC does without hindering performance, then why does Java use GC? Why doesn't it use ARC as well?

Every new developer who comes to Objective-C has to learn the rigid rules of when to retain, release, and autorelease objects. These rules even specify naming conventions that imply the retain count of objects returned from methods. Memory management in Objective-C becomes second nature once you take these rules to heart and apply them consistently, but even the most experienced Cocoa developers slip up from time to time.
With the Clang Static Analyzer, the LLVM developers realized that these rules were reliable enough that they could build a tool to point out memory leaks and overreleases within the paths that your code takes.
Automatic reference counting (ARC) is the next logical step. If the compiler can recognize where you should be retaining and releasing objects, why not have it insert that code for you? Rigid, repetitive tasks are what compilers and their brethren are great at. Humans forget things and make mistakes, but computers are much more consistent.
However, this doesn't completely free you from worrying about memory management on these platforms. I describe the primary issue to watch out for (retain cycles) in my answer here, which may require a little thought on your part to mark weak pointers. However, that's minor when compared to what you're gaining in ARC.
When compared to manual memory management and garbage collection, ARC gives you the best of both worlds by cutting out the need to write retain / release code, yet not having the halting and sawtooth memory profiles seen in a garbage collected environment. About the only advantages garbage collection has over this are its ability to deal with retain cycles and the fact that atomic property assignments are inexpensive (as discussed here). I know I'm replacing all of my existing Mac GC code with ARC implementations.
As to whether this could be extended to other languages, it seems geared around the reference counting system in Objective-C. It might be difficult to apply this to Java or other languages, but I don't know enough about the low-level compiler details to make a definitive statement there. Given that Apple is the one pushing this effort in LLVM, Objective-C will come first unless another party commits significant resources of their own to this.
The unveiling of this shocked developers at WWDC, so people weren't aware that something like this could be done. It may appear on other platforms over time, but for now it's exclusive to LLVM and Objective-C.

ARC is just play old retain/release (MRC) with the compiler figuring out when to call retain/release. It will tend to have higher performance, lower peak memory use, and more predictable performance than a GC system.
On the other hand some types of data structure are not possible with ARC (or MRC), while GC can handle them.
As an example, if you have a class named node, and node has an NSArray of children, and a single reference to its parent that "just works" with GC. With ARC (and manual reference counting as well) you have a problem. Any given node will be referenced from its children and also from its parent.
Like:
A -> [B1, B2, B3]
B1 -> A, B2 -> A, B3 -> A
All is fine while you are using A (say via a local variable).
When you are done with it (and B1/B2/B3), a GC system will eventually decide to look at everything it can find starting from the stack and CPU registers. It will never find A,B1,B2,B3 so it will finalize them and recycle the memory into other objects.
When you use ARC or MRC, and finish with A it have a refcount of 3 (B1, B2, and B3 all reference it), and B1/B2/B3 will all have a reference count of 1 (A's NSArray holds one reference to each). So all of those objects remain live even though nothing can ever use them.
The common solution is to decide one of those references needs to be weak (not contribute to the reference count). That will work for some usage patterns, for example if you reference B1/B2/B3 only via A. However in other patterns it fails. For example if you will sometimes hold onto B1, and expect to climb back up via the parent pointer and find A. With a weak reference if you only hold onto B1, A can (and normally will) evaporate, and take B2, and B3 with it.
Sometimes this isn't an issue, but some very useful and natural ways of working with complex structures of data are very difficult to use with ARC/MRC.
So ARC targets the same sort of problems GC targets. However ARC works on a more limited set of usage patterns then GC, so if you took a GC language (like Java) and grafted something like ARC onto it some programs wouldn't work any more (or at least would generate tons of abandoned memory, and may cause serious swapping issues or run out of memory or swap space).
You can also say ARC puts a bigger priority on performance (or maybe predictability) while GC puts a bigger priority on being a generic solution. As a result GC has less predictable CPU/memory demands, and a lower performance (normally) than ARC, but can handle any usage pattern. ARC will work much better for many many common usage patterns, but for a few (valid!) usage patterns it will fall over and die.

Magic
But more specifically ARC works by doing exactly what you would do with your code (with certain minor differences). ARC is a compile time technology, unlike GC which is runtime and will impact your performance negatively. ARC will track the references to objects for you and synthesize the retain/release/autorelease methods according to the normal rules. Because of this ARC can also release things as soon as they are no longer needed, rather than throwing them into an autorelease pool purely for convention sake.
Some other improvements include zeroing weak references, automatic copying of blocks to the heap, speedups across the board (6x for autorelease pools!).
More detailed discussion about how all this works is found in the LLVM Docs on ARC.

It varies greatly from garbage collection. Have you seen the warnings that tell you that you may be leaking objects on different lines? Those statements even tell you on what line you allocated the object. This has been taken a step further and now can insert retain/release statements at the proper locations, better than most programmers, almost 100% of the time. Occasionally there are some weird instances of retained objects that you need to help it out with.

Very well explained by Apple developer documentation. Read "How ARC Works"
To make sure that instances don’t disappear while they are still needed, ARC tracks how many properties, constants, and variables are currently referring to each class instance. ARC will not deallocate an instance as long as at least one active reference to that instance still exists.
To make sure that instances don’t disappear while they are still needed, ARC tracks how many properties, constants, and variables are currently referring to each class instance. ARC will not deallocate an instance as long as at least one active reference to that instance still exists.
To know Diff. between Garbage collection and ARC: Read this

ARC is a compiler feature that provides automatic memory management of objects.
Instead of you having to remember when to use retain, release, and autorelease, ARC evaluates the lifetime requirements of your objects and automatically inserts appropriate memory management calls for you at compile time. The compiler also generates appropriate dealloc methods for you.
The compiler inserts the necessary retain/release calls at compile time, but those calls are executed at runtime, just like any other code.
The following diagram would give you the better understanding of how ARC works.
Those who're new in iOS development and not having work experience on Objective C.
Please refer the Apple's documentation for Advanced Memory Management Programming Guide for better understanding of memory management.

How to add memory to the heap at runtime?

I am using Keil's ARM-MDK 4.11. I have a statically allocated block of memory that is used only at startup. It is used before the scheduler is initialised and due to the way RL-RTX takes control of the heap-management, cannot be dynamically allocated (else subsequent allocations after the scheduler starts cause a hard-fault).
I would like to add this static block as a free-block to the system heap after the scheduler is initialised. It would seem that __Heap_ProvideMemory() might provide the answer, this is called during initialisation to create the initial heap. However that would require knowledge of the heap descriptor address, and I can find no documented method of obtaining that.
Any ideas?
I have raised a support request with ARM/Keil for this, but they are more interested in questioning why I would want to do this, and offering alternative solutions. I am well aware of the alternatives, but in this case if this could be done it would be the cleanest solution.

We use the Rowley Crossworks compiler but had a similar issue - the heap was being set up in the compiler CRT startup code. Unfortunately the SDRAM wasn't initialised till the start of main() and so the heap wasn't set up properly. I worked around it by reinitialising the heap at the start of main(), after the SDRAM was initialised.
I looked at the assembler code that the compiler uses at startup to work out the structure - it wasn't hard. Subsequently I have also obtained the malloc/free source code from Rowley - perhaps you could ask Keil for their version?

One method I've used is to incorporate my own simple heap routines and take over the malloc()/calloc()/free() functions from the library.
The simple, custom heap routines had an interface that allowed adding blocks of memory to the heap.
The drawback to this (at least in my case) was that the custom heap routines were far less sophisticated than the built-in library routines and were probably more prone to fragmentation than the built-in routines. That wasn't a serious issue in that particular application. If you want the capabilities of the built-in library routines, you could probably have your malloc() defer to the built-in heap routines until it returns a failure, then try to allocate from your custom heap.
Another drawback is that I found it much more painful to make sure the custom routines were bug-free than I thought it would be at first glance, even though I wasn't trying to do anything too fancy (just a simple list of free blocks that could be split on allocation and coalesced when freed).
The one benefit to this technique is that it's pretty portable (as long as your custom routines are portable) and doesn't break if the toolchain changes it's internals. The only part that requires porting is taking over the malloc()/free() interface and making sure you get initialized early enough.

Avoiding, finding and removing memory leaks in Cocoa

Memory (and resource) leaks happen. How do you make sure they don't?
What tips & techniques would you suggest to help avoid creating memory leaks in first place?
Once you have an application that is leaking how do you track down the source of leaks?
(Oh and please avoid the "just use GC" answer. Until the iPhone supports GC this isn't a valid answer, and even then - it is possible to leak resources and memory on GC)

In XCode 4.5, use the built in Static Analyzer.
In versions of XCode prior to 3.3, you might have to download the static analyzer. These links show you how:
Use the LLVM/Clang Static Analyzer
To avoid creating memory leaks in the first place, use the Clang Static Analyzer to -- unsurprisingly -- analyse your C and Objective-C code (no C++ yet) on Mac OS X 10.5. It's trivial to install and use:
Download the latest version from this page.
From the command-line, cd to your project directory.
Execute scan-build -k -V xcodebuild.
(There are some additional constraints etc., in particular you should analyze a project in its "Debug" configuration -- see http://clang.llvm.org/StaticAnalysisUsage.html for details -- the but that's more-or-less what it boils down to.)
The analyser then produces a set of web pages for you that shows likely memory management and other basic problems that the compiler is unable to detect.
If your project does not target Mac OS X desktop, there are a couple of other details:
Set the Base SDK for All Configurations to an SDK that uses the Mac OS X desktop frameworks...
Set the Command Line Build to use the Debug configuration.
(This is largely the same answer as to this question.)

Don't overthink memory management
For some reason, many developers (especially early on) make memory management more difficult for themselves than it ever need be, frequently by overthinking the problem or imagining it to be more complicated than it is.
The fundamental rules are very simple. You should concentrate just on following those. Don't worry about what other objects might do, or what the retain count is of your object. Trust that everyone else is abiding by the same contract and it will all Just Work.
In particular, I'll reiterate the point about not worrying about the retain count of your objects. The retain count itself may be misleading for various reasons. If you find yourself logging the retain count of an object, you're almost certainly heading down the wrong path. Step back and ask yourself, are you following the fundamental rules?

Always use accessor methods; declare accessors using properties
You make life much simpler for yourself if you always use accessor methods to assign values to instance variables (except in init* and dealloc methods). Apart from ensuring that any side-effects (such as KVO change notifications) are properly triggered, it makes it much less likely that you'll suffer a copy-and-paste or some other logic error than if you sprinkle your code with retains and releases.
When declaring accessors, you should always use the Objective-C 2 properties feature. The property declarations make the memory management semantics of the accessors explicit. They also provide an easy way for you to cross-check with your dealloc method to make sure that you have released all the properties you declared as retain or copy.

The Instruments Leaks tool is pretty good at finding a certain class of memory leak. Just use "Start with Performance Tool" / "Leaks" menu item to automatically run your application through this tool. Works for Mac OS X and iPhone (simulator or device).
The Leaks tool helps you find sources of leaks, but doesn't help so much tracking down the where the leaked memory is being retained.

Follow the rules for retaining and releasing (or use Garbage Collection). They're summarized here.
Use Instruments to track down leaks. You can run an application under Instruments by using Build > Start With Performance Tool in Xcode.

I remember using a tool by Omni a while back when I was trying to track down some memory leaks that would show all retain/release/autorelease calls on an object. I think it showed stack traces for the allocation as well as all retains and releases on the object.
http://www.omnigroup.com/developer/omniobjectmeter/

First of all, it's vitally important that your use of [ ] and { } brackets and braces match the universal standard. OK, just kiddin'.
When looking at leaks, you can assume that the leak is due to a problem in your code but that's not 100% of the fault. In some cases, there may be something happening in Apple's (gasp!) code that is at fault. And it may be something that's hard to find, because it doesn't show up as cocoa objects being allocated. I've reported leak bugs to Apple in the past.
Leaks are sometimes hard to find because the clues you find (e.g. hundreds of strings leaked) may happen not because those objects directly responsible for the strings are leaking, but because something is leaking that object. Often you have to dig through the leaves and branches of a leaking 'tree' in order to find the 'root' of the problem.
Prevention: One of my main rules is to really, really, really avoid ever allocating an object without just autoreleasing it right there on the spot. Anywhere that you alloc/init an object and then release it later on down in the block of code is an opportunity for you to make a mistake. Either you forget to release it, or you throw an exception so that the release never gets called, or you put a 'return' statement for early exit somewhere in the method (something I try to avoid also).

You can build the beta port of Valgrind from here: http://www.sealiesoftware.com/valgrind/
It's far more useful than any static analysis, but doesn't have any special Cocoa support yet that I know of.

Obviously you need to understand the basic memory management concepts to begin with. But in terms of chasing down leaks, I highly recommend reading this tutorial on using the Leaks mode in Instruments.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas