Best way to handle and report memory allocation errors due to integer overflow in Objective-C? - objective-c

To begin with, let me say that I understand how and why the problem I'm describing can happen. I was a Computer Science major, and I understand overflow/underflow and signed/unsigned arithmetic. (For those unfamiliar with the topic, Apple's Secure Coding Guide discusses integer overflow briefly.)
My question is about reporting and recovering from such an error once it has been detected, and more specifically in the case of an Objective-C framework. (I write and maintain CHDataStructures.) I have a few collections classes that allocate memory for storing objects and dynamically expand as necessary. I haven't yet seen any overflow-related crashes, probably because my test cases mostly use sane data. However, given unvalidated values, things could explode rather quickly, and I want to prevent that.
I have identified at least two common cases where this can occur:
The caller passes a very large unsigned value (or negative signed value) to -initWithCapacity:.
Enough objects have been added to cause the capacity to dynamically expand, and the capacity has grown large enough to cause overflow.
The easy part is detecting whether overflow will occur. (For example, before attempting to allocate length * sizeof(void*) bytes, I can check whether length <= UINT_MAX / sizeof(void*), since failing this test will mean that the product will overflow and potentially allocate a much smaller region of memory than desired. On platforms that support it, the checkint.h API is another alternative.) The harder part is determining how to deal with it gracefully. In the first scenario, the caller is perhaps better equipped (or at least in the mindset) to deal with a failure. The second scenario can happen anywhere in the code that an object is added to the collection, which may be quite non-deterministic.
My question, then, is this: How is "good citizen" Objective-C code expected to act when integer overflow occurs in this type of situation? (Ideally, since my project is a framework in the same spirit as Foundation in Cocoa, I'd like to model off of the way it behaves for maximum "impedance matching". The Apple documentation I've found doesn't mention much at all about this.) I figure that in any case, reporting the error is a given. Since the APIs to add an object (which could cause scenario 2) don't accept an error parameter, what can I really do to help resolve the problem, if anything? What is really considered okay in such situations? I'm loath to knowingly write crash-prone code if I can do better...

Log and raise an exception.
You can only really be a good citizen to other programmers, not the end user, so pass the problem upstairs and do it in a way that clearly explains what is going on, what the problem is (give numbers) and where it is happening so the root cause can be removed.

There are two issues at hand:
(1) An allocation has failed and you are out of memory.
(2) You have detected an overflow or other erroneous condition that will lead to (1) if you continue.
In the case of (1), you are hosed (unless the failed allocation was both stupid large & you know that the failed allocation was only that one). If this happens, the best thing you can do is to crash as quickly as possible and leave behind as much evidence as you can. In particular, creating a function that calls abort() of a name like IAmCrashingOnPurposeBecauseYourMemoryIsDepleted() will leave evidence in the crash log.
If it is really (2), then there are additional questions. Specifically, can you recover from the situation and, regardless, is the user's data still intact? If you can recover, then grand... do so and the user never has to know. If not, then you need to make absolutely sure that the user's data is not corrupt. If it isn't, then save and die. If the user's data is corrupt, then do your best to not persist the corrupted data and let the user know that something has gone horribly wrong. If the user's data is already persisted, but corrupt, then... well... ouch... you might want to consider creating a recovery tool of some kind.

With regards to dynamically growing, array-based storage, there's only so much that can be done. I'm a developer on the Moab scheduler for supercomputers, and we also deal with very large numbers on systems with thousands of processors, thousands of jobs, and massive amounts of job output. At some point, you can't declare a buffer to be any bigger, without creating a whole new data-type to deal with sizes larger than UINT_MAX, or LONG_LONG_MAX etc., at which point on most "normal" machines you'll be running out of stack/heap space anyway. So I'd say log a meaningful error-message, keep the collection from exploding, and if the user needs to add that many things to a CHDataStructures collection, they ought to know that there are issues dealing with very large numbers, and the caller ought to check whether the add was successful (keep track of the size of the collection, etc.).
Another possibility is to convert array-based storage to dynamically allocated, linked-list-based storage when you get to the point when you can't allocate a larger array with an unsigned int or unsigned long. This would be expensive, but would happen rarely enough that it shouldn't be terribly noticeable to users of the framework. Since the limit on the size of a dynamically allocated, linked-list-based collection is the size of the heap, any user that added enough items to a collection to "overflow" it then would have bigger problems than whether or not his item was successfully added.

I'd say the correct thing to do would be to do what the Cocoa collections do. For example, if I have the following code:
int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
NSMutableArray * a = [[NSMutableArray alloc] init];
for (uint32_t i = 0; i < ULONG_MAX; ++i) {
for (uint32_t i = 0; i < 10000000; ++i) {
[a addObject:#"foo"];
}
NSLog(#"%lu rounds of 10,000,000 completed", i+1);
}
[a release];
[pool drain];
return 0;
}
..and just let it run, it will eventually die with EXC_BAD_ACCESS. (I compiled and ran this as a 32-bit app so I could be sure to run out of space when I hit 2**32 objects.
In other words, throwing an exception would be nice, but I don't think you really have to do anything.

Using assertions and a custom assertion handler may be the best available option for you.
With assertions, you could easily have many checkpoints in your code, where you verify that things work as they should. If they don't, by default the assertion macro logs the error (developer-defined string), and throws an exception. You can also override the default behavior using a custom assertion handler and implement a different way to handle error conditions (even avoid throwing exceptions).
This approach allows for a greater degree of flexibility and you can easily modify your error handling strategy (throwing exceptions vs. dealing with errors internally) at any point.
The documentation is very concise: Assertions and Logging.

Related

Read line in Rust

What is the reason for such an implementation of the io::stdin().read_line(&mut input) method? Why not just return a Result with the appropriate error or input? Why pass &mut input? Does this approach have any advantages?
The huge advantage of this is that because you're passing a mutable reference to an existing String, you can reuse a buffer, or pre-allocate it as needed:
// 2K is a good default buffer size, but definitely do
// analyze the situation and adjust its size accordingly
let mut buffer = String::with_capactity(2048);
// Lock our standard input to eliminate synchronization overhead (unlocks when dropped)
let mut stdin = io::stdin().lock();
// Read our first line.
stdin.read_line(&mut buffer)?;
// This is a stand-in for any function that takes an &str
process_line_1(&buffer);
// Discard the data we've read, but retain the buffer that we have
buffer.clear();
// Reading a second line will reuse the memory allocation:
stdin.read_line(&mut buffer)?;
process_line_2(&buffer)?;
Remember: allocating too much is a lot more efficient than allocating too often. Buffer sharing across different functions may be a little unwieldy due to Rust's borrowing rules (my advice is to have a "cache struct" that keeps empty pre-allocated buffers for a specific function or a collection of APIs), but if you're creating and destroying the buffer within one function, there's minimal work required to get this caching set up and varying potential for performance benefit from caching allocations like this.
What's great about this API is that it not only enables buffer reuse in a clean way, it also encourages it. If you have several .read_line() calls in a row, it immediately feels wrong to create a new buffer for every call. The API design teaches you how to use it efficiently without saying a word. The takeaway is that this tiny trick doesn't just improve performance of I/O code in Rust, it also attempts to guide beginners towards designing their own APIs in this manner, although allocation reuse is sadly often overlooked in third-party APIs. [citation needed]
I believe it has to do with considerations about where the line read will live: on the heap or maybe on the stack of the caller or wherever the caller wants it to be.
Note that a function has no way to return a reference to a value that lives on it's own stack as that value wouldn't live long enough. So the only other option would be to allocate it on the heap or copy the whole thing around, neither of which is desirable from the POV of the caller.
(Please take into account that I am a rust beginner myself, so this answer may be totally wrong. In which case I'm ready to delete it.)

vkAllocateDescriptorSets returns VK_OUT_OF_HOST_MEMORY

I wrote vulkan code on my laptop that worked, and then I got a new laptop and now running it, the program aborts because vkAllocateDescriptorSets() returns VK_OUT_OF_HOST_MEMORY.
I doubt that it is actually out of memory, and I know it can allocate some memory because VkCreateInstance() doesn't fail like in this stack overflow post: Vulkan create instance VK_OUT_OF_HOST_MEMORY.
EDIT: Also, I forgot to mention, vkAllocateDescriptorSets() only returns VK_OUT_OF_HOST_MEMORY the second time I run it.
vkAllocateDescriptorSets allocates descriptors from a pool. So while such allocation could fail due to a lack of host/device memory, there are two other things that could cause failure. There may simply not be enough memory in the pool to allocate the number of descriptors/sets you asked for. Or there could be enough memory, but repeated allocations/deallocations have fragmented the pool such that the allocations cannot be made.
The case of allocating more descriptors/sets than are available should never happen. After all, you know how many descriptors&sets you put into that pool, so you should know exactly when you'll run out. This is an error state that a working application can guarantee it will never encounter. Though the VK_KHR_maintenance1 extension did add support for this circumstance: VK_ERROR_OUT_OF_POOL_MEMORY_KHR.
However, if you've screwed up your pool creation in some way, you will get this possibility. Of course, since there's no error code for it (outside of the extension), the implementation will have to provide a different error code: either host or device memory exhaustion.
But again, this is a programming error on your part, not something you should ever expect to see with working code. In particular, even if you request that extension, do not keep allocating from a pool until it stops giving you memory. That's just bad coding.
For the fragmentation case, they do have an error code: VK_ERROR_FRAGMENTED_POOL. However, the Khronos Group screwed up. See, the first few releases of Vulkan didn't include this error code; it was added later. Which means that implementations from before the adding of this error code (and likely afterwards) had to pick an inappropriate error code to return. Again, either host or device memory.
So you basically have to treat any failure of this function as either fragmentation, programming error (ie: you asked for more stuff than you put into the pool), or something else. In all cases, it's not something you can recover from at runtime.
Since it appeared to work once, odds are good that you probably just allocated more stuff than the pool contains. So you should make sure that you add enough stuff to the pool before allocating from it.
The problem was that I had not allocated enough memory in the pool. I solved it by creating multiple pools. One for each descriptor set.

Does class_getInstanceSize have a known bug about returning incorrect sizes?

Reading through the other questions that are similar to mine, I see that most people want to know why you would need to know the size of an instance, so I'll go ahead and tell you although it's not really central to the problem. I'm working on a project that requires allocating thousands to hundreds of thousands of very small objects, and the default allocation pattern for objects simply doesn't cut it. I've already worked around this issue by creating an object pool class, that allows a tremendous amount of objects to be allocated and initialized all at once; deallocation works flawlessly as well (objects are returned to the pool).
It actually works perfectly and isn't my issue, but I noticed class_getInstanceSize was returning unusually large sizes. For instance, a class that stores one size_t and two (including isA) Class instance variables is reported to be 40-52 bytes in size. I give a range because calling class_getInstanceSize multiple times, even in a row, has no guarantee of returning the same size. In fact, every object but NSObject seemingly reports random sizes that are far from what they should be.
As a test, I tried:
printf("Instance Size: %zu\n", class_getInstanceSize(objc_getClass("MyClassName"));
That line of code always returns a value that corresponds to the size that I've calculated by hand to be correct. For instance, the earlier example comes out to 12 bytes (32-bit) and 24 bytes (64-bit).
Thinking that the runtime may be doing something behind the scenes that requires more memory, I watched the actual memory use of each object. For the example given, the only memory read from or written to is in that 12/24 byte block that I've calculated to be the expected size.
class_getInstanceSize acts like this on both the Apple & GNU 2.0 runtime. So is there a known bug with class_getInstanceSize that causes this behavior, or am I doing something fundamentally wrong? Before you blame my object pool; I've tried this same test in a brand new project using both the traditional alloc class method and by allocating the object using class_createInstance(self, 0); in a custom class method.
Two things I forgot to mention before: I'm almost entirely testing this on my own custom classes, so I know the trickery isn't down to the class actually being a class cluster or any of that nonsense; second, class_getInstanceSize([MyClassName class]) and class_getInstanceSize(self) \\ Ran inside a class method rarely produce the same result, despite both simply referencing isA. Again, this happens in both runtimes.
I think I've solved the problem and it was due to possibly the dumbest reason ever.
I use a profiling/debugging library that is old; in fact, I don't know its actual name (the library is libcsuomm; the header for it has no identifying info). All I know about it is that it was a library available on the computers in the compsci labs (I did a year of Comp-Sci before switching to a Geology major, graduating and never looking back).
Anyway, the point of the library is that it provides a number of profiling and debugging functionalities; the one I use it most for is memory leak detection, since it actually tracks per object unlike my other favorite memory-leak library (now unsupported, MSS) which is based in C and not aware of objects outside of raw allocations.
Because I use it so much when debugging, I always set it up by default without even thinking about it. So even when creating my test projects to try and pinpoint the bug, I set it up without even putting any thought into it. Well, it turns out that the library works by pulling some runtime trickery, so it can properly track objects. Things seem to work correctly now that I've disabled it, so I believe that it was the source of my problems.
Now I feel bad about jumping to conclusions about it being a bug, but at the time I couldn't see anything in my own code that could possibly cause that problem.

What do you think about this code in Objective-C that iterates through retain count and call release every iteration?

I'm still trying to understand this piece of code that I found in a project I'm working on where the guy that created it left the company before I could ask.
This is the code:
-(void)releaseMySelf{
for (int i=myRetainCount; i>1; i--) {
[self release];
}
[self autorelease];
}
As far as I know, in Objective-C memory management model, the first rule is that the object that allocates another object, is also responsible to release it in the future. That's the reason I don't understand the meaning of this code. Is there is any meaning?
The author is trying to work around not understand memory management. He assumes that an object has a retain count that is increased by each retain and so tries to decrease it by calling that number of releases. Probably he has not implemented the "is also responsible to release it in the future." part of your understanding.
However see many answers here e.g. here and here and here.
Read Apple's memory management concepts.
The first link includes a quote from Apple
The retainCount method does not account for any pending autorelease
messages sent to the receiver.
Important: This method is typically of no value in debugging memory
management issues. Because any number of framework objects may have
retained an object in order to hold references to it, while at the
same time autorelease pools may be holding any number of deferred
releases on an object, it is very unlikely that you can get useful
information from this method. To understand the fundamental rules of
memory management that you must abide by, read “Memory Management
Rules”. To diagnose memory management problems, use a suitable tool:
The LLVM/Clang Static analyzer can typically find memory management
problems even before you run your program. The Object Alloc instrument
in the Instruments application (see Instruments User Guide) can track
object allocation and destruction. Shark (see Shark User Guide) also
profiles memory allocations (amongst numerous other aspects of your
program).
Since all answers seem to misread myRetainCount as [self retainCount], let me offer a reason why this code could have been written: It could be that this code is somehow spawning threads or otherwise having clients register with it, and that myRetainCount is effectively the number of those clients, kept separately from the actual OS retain count. However, each of the clients might get its own ObjC-style retain as well.
So this function might be called in a case where a request is aborted, and could just dispose of all the clients at once, and afterwards perform all the releases. It's not a good design, but if that's how the code works, (and you didn't leave out an int myRetainCount = [self retainCount], or overrides of retain/release) at least it's not necessarily buggy.
It is, however, very likely a bad distribution of responsibilities or a kludgey and hackneyed attempt at avoiding retain circles without really improving anything.
This is a dirty hack to force a memory release: if the rest of your program is written correctly, you never need to do anything like this. Normally, your retains and releases are in balance, so you never need to look at the retain count. What this piece of code says is "I don't know who retained me and forgot to release, I just want my memory to get released; I don't care that the others references would be dangling from now on". This is not going to compile with ARC (oddly enough, switching to ARC may just fix the error the author was trying to work around).
The meaning of the code is to force the object to deallocate right now, no matter what the future consequences may be. (And there will be consequences!)
The code is fatally flawed because it doesn't account for the fact that someone else actually "owns" that object. In other words, something "alloced" that object, and any number of other things may have "retained" that object (maybe a data structure like NSArray, maybe an autorelease pool, maybe some code on the stackframe that just does a "retain"); all those things share ownership in this object. If the object commits suicide (which is what releaseMySelf does), these "owners" suddenly point to bad memory, and this will lead to unexpected behavior.
Hopefully code written like this will just crash. Perhaps the original author avoided these crashes by leaking memory elsewhere.

Is asserting that every object creation succeeded necessary in Objective C?

I have recently read Apple's sample code for MVCNetworking written by Apple's Developer Technical Support guru Quinn "The Eskimo!". The sample is really nice learning experience with what I guess are best development practices for iOS development.
What surprised me, coming from JVM languages, are extremely frequent assertions like this:
syncDate = [NSDate date];
assert(syncDate != nil);
and this:
photosToRemove = [NSMutableSet setWithArray:knownPhotos];
assert(photosToRemove != nil);
and this:
photoIDToKnownPhotos = [NSMutableDictionary dictionary];
assert(photoIDToKnownPhotos != nil);
Is that really necessary? Is that coding style worth emulating?
If you're used to Java, this may seem strange. You'd expect an object creation message to throw an exception when it fails, rather than return nil. However, while Objective-C on Mac OS X has support for exception handling; it's an optional feature that can be turned on/off with a compiler flag. The standard libraries are written so they can be used without exception handling turned on: hence messages often return nil to indicate errors, and sometimes require you to also pass a pointer to an NSError* variable. (This is for Mac development, I'm not sure whether you can even turn exception handling support on for iOS, considering you also can't turn on garbage collection for iOS.)
The section "Handling Initialization Failure" in the document "The Objective-C Programming Language" explains how Objective-C programmers are expected to deal with errors in object initialization/creation: that is, return nil.
Something like [NSData dataWithContentsOfFile: path] may definitely return nil: the documentation for the method explicitly says so. But I'm honestly not sure whether something like [NSMutableArray arrayWithCapacity: n] ever returns nil. The only situation I can think of when it might is when the application is out of memory. But in that case I'd expect the application to be aborted by the attempt to allocate more memory. I have not checked this though, and it may very well be that it returns nil in this case. While in Objective-C you can often safely send messages to nil, this could then still lead to undesirable results. For example, your application may try to make an NSMutableArray, get nil instead, and then happily continue sending addObject: to nil and write out an empty file to disk rather than one with elements of the array as intended. So in some cases it's better to check explicitly whether the result of a message was nil. Whether doing it at every object creation is necessary, like the programmer you're quoting is doing, I'm not sure. Better safe than sorry perhaps?
Edit: I'd like to add that while checking that object creation succeeded can sometimes be a good idea, asserting it may not be the best idea. You'd want this to be also checked in the release version of your application, not just in the debug version. Otherwise it kind of defeats the point of checking it, since you don't want the application end user to, for example, wind up with empty files because [NSMutableArray arrayWithCapacity: n] returned nil and the application continued sending messages to the nil return value. Assertions (with assert or NSAssert) can be removed from the release version with compiler flags; Xcode doesn't seem to include these flags by default in the "Release" configuration though. But if you'd want to use these flags to remove some other assertions, you'd also be removing all your "object creation succeeded" checks.
Edit: Upon further reflection, it seems more plausible than I first thought that [NSMutableArray arrayWithCapacity: n] would return nil rather than abort the application when not enough memory is available. Basic C malloc also doesn't abort but returns a NULL pointer when not enough memory is available. But I haven't yet found any clear mention of this in the Objective-C documentation on alloc and similar methods.
Edit: Above I said I wasn't sure checking for nil is necessary at every object creation. But it shouldn't be. This is exactly why Objective-C allows sending messages to nil, which then return nil (or 0 or something similar, depending on the message definition): this way, nil can propagate through your code somewhat similar to an exception so that you don't have to explicitly check for nil at every single message that might return it. But it's a good idea to check for it at points where you don't want it to propagate, like when writing files, interacting with the user and so on, or in cases where the result of sending a message to nil is undefined (as explained in the documentation on sending messages to nil). I'd be inclined to say this is like the "poor man's" version of exception propagation&handling, though not everyone may agree that the latter is better; but nil doesn't tell you anything about why an error occurred and you can easily forget to check for it where such checks are necessary.
Yup. I think it's a good idea.. It helps to filter out the edge cases (out of memory, input variables empty/nil) as soon as the variables are introduced. Although I am not sure the impact on speed because of the overhead!
I guess it's a matter of personal choice. Usually asserts are used for debugging purpose so that the app crashes at the assert points if the conditions are not met. You'd normally like to strip them out on your app releases though.
I personally am too lazy to place asserts around every block of code as you have shown. I think it's close to being a bit too paranoid. Asserts might be pretty handy in case of conditions where some uncertainity is involved.
I have also asked this on Apple DevForums. According to Quinn "The Eskimo!" (author of the MVCNetworking sample in question) it is a matter of coding style and his personal preference:
I use lots of asserts because I hate debugging. (...)
Keep in mind that I grew up with traditional Mac OS, where a single rogue pointer could bring down your entire machine (similarly to kernel programming on current systems). In that world it was important to find your bugs sooner rather than later. And lots of asserts help you do that.
Also, even today I spend much of my life dealing with network programs. Debugging network programs is hard because of the asynchrony involved. Asserts help to with this, because they are continually checking the state of your program as it runs.
However, I think you have a valid point with stuff like +[NSDate date]. The chances of that returning nil are low. The assert is there purely from habit. But I think the costs of this habit (some extra typing, learning to ignore the asserts) are small compared to the benefits.
From this I gather that asserting that every object creation succeeded is not strictly necessary.
Asserts can be valuable to document the pre-conditions in methods, during development, as design aid for other maintainers (including the future self). I personally prefer the alternative style - to separate the specification and implementation using TDD/BDD practices.
Asserts can be used to double-check runtime types of method arguments due to the dynamic nature of Objective C:
assert([response isKindOfClass:[NSHTTPURLResponse class]]);
I'm sure there are more good uses of assertions. All Things In Moderation...