When ARC came to Objective-C, I did my best to read through the Objective-C Automatic Reference Counting (ARC) guide posted on the Clang project website to get a better hang of what it was about. What I found there (and no where else) was mention of using __attribute__ declarations to signify to ARC whether certain code autoreleases its return value, for instance (__attribute__((ns_returns_autoreleased))), or whether it 'consumes' a parameter (__attribute((ns_consumed)), and so on.
However, it seems that the guide gives very little word on the actual level of necessity these declarations hold. Excluding them seems to make no difference, neither when running the static analyzer nor when running the project itself. Do these even make a difference? Is there any advantage to labeling a method with __attribute__((objc_method_family(new)))? No article I've found on ARC makes mention of these specifiers at all; perhaps an ARC guru can give word on what these are used for.
(Personally, I include all relevant specifiers just in case, but find that they make code obfuscated and messy.)
These attributes are expressly for abnormal cases, such as:
A function or method parameter of retainable object pointer type may be marked as consumed, signifying that the callee expects to take ownership of a +1 retain count.
A function or method which returns a retainable object pointer type may be marked as returning a retained value, signifying that the caller expects to take ownership of a +1 retain count.
You don't normally do these things, so you don't normally use these attributes. With no attributes, the normal behavior—the NARC rule, or perhaps under ARC I should say CAN—is what the compiler implements and expects.
There are two reasons to use these attributes:
In order to violate the CAN rule; that is, to have a method not so named that returns a reference, or a method so named that doesn't. The attribute documents the violation in the method's prototype, and may even be necessary to implement it, if the implementation uses ARC.
Working with Core Foundation types, including Core Graphics types. These aren't ARCed, so you need to use the bridging attributes to aid conversion to and from “retainable object pointer” types.
That's not necessary in most of the cases, since LLVM & Clang knows ObjC naming conventions. So if you follow the standard naming conventions of Cocoa, LLVM automagically assumes the corresponding family/return memory policy to follow.
Namely, if you declare a method named initWith... it will automatically consider it as the "init" family of methods, no need to specify __attribute__((objc_method_family(init))), Clang automatically detect it; same for the new family, etc.
In fact, you only need to use the __attribute__ specifiers when Clang can't guess such cases, which in practice rarely occurs (in practice I never had to use it), or only if you don't respect naming conventions:
Quoting Clang Language Extensions Documentation:
Many methods in Objective-C have conventional meanings determined by their selectors. For the purposes of static analysis, it is sometimes useful to be able to mark a method as having a particular conventional meaning despite not having the right selector, or as not having the conventional meaning that its selector would suggest. For these use cases, we provide an attribute to specifically describe the method family that a method belongs to.
So as soon as you respect the naming conventions (which you should always do) you won't have anything do to.
You should definitely stick to naming conventions wherever possible.
It's clearer to read.
Attributes can introduce build errors if there is a conflict.
ARC semantics combined with attributes are relatively fragile.
Related
I have experience with pure ARC coding. As a compiler feature it honors Objctive-C method family putting right retain/release calls whenever neeeded.
All methods that start with alloc, mutableCopy, copy and new create a new object. They increase the retain count. As a consequence, ARC will release any pointer (and hence the object associated with it) when I no longer need it.
I think that problems could arise when I write methods that do not follow naming conventions. For example, if I write a method like newCustomer that in a first version returns an autoreleased object while in a second version does not, what could happen?
In particular, my questions are the following (they belong to the same reasoning):
What happens if the calling and called code are both compiled with ARC?
(a) What happens if the calling code is compiled with ARC while the called is compiled with non-ARC?
(b) What happens if the calling code is compiled with non-ARC while the called is compiled with ARC?
It would be appreciated an answer that shows how ARC works under the hood (objc_release, objc_retainAutoreleasedReturnValue, etc.).
Thank you in advance.
A method named newCustomer would fall within the new method family and is thus implicitly marked as returning an retained object. When both calling and called code is compiled with ARC, then ARC balances the extra retain with a release in the caller:
When returning from such a function or method, ARC retains the value
at the point of evaluation of the return statement, before leaving all
local scopes.
When receiving a return result from such a function or method, ARC
releases the value at the end of the full-expression it is contained
within, subject to the usual optimizations for local values.
Source
If newCustomer is implemented with manual reference counting and violates the naming convention (i.e., does not return a retained object), then the caller can either over release or under release, depending on the circumstances.
If the caller uses ARC, then the object returned from newCustomer will be overreleased - likely causing the program to crash. This is because the calling code will participate in the second half of the above process, without having had a corresponding retain performed prior to that.
If the calling code is not compiled with ARC, but the called code is (thus correctly implementing returning a retained object), then the behavior depends on the programmer following the naming conventions. If they release the returned value, then the object's reference count will be correctly managed. However, if the programmer believes that their new... method does violate the naming convention, and fails to manually insert a release in the calling code, then the object that was returned will leak.
All in all, as Martin R. points out in the comments, the critical determination is whether the naming conventions are followed in any environment including manual reference counting.
Just like any other language, when you violate some of the basic assumptions of the language you wander into the area of undefined behavior. At some point in the future, Apple may modify the internals of how -new... does reference counting. It is up to Apple to make sure that code which conforms to the expected use works, but they won't do that for non-conforming uses.
If you need to know what the actual behavior is for a particular version of a compiler running on a particular system, then you must test for it. Don't assume that behavior will be the same for other compilers or versions of the runtime.
In the end, undefined behavior is undefined behavior. When you build code relying on it, you will eventually be affected by a subtle and difficult to diagnose defect.
I have ARC code of the following form:
NSMutableData* someData = [NSMutableData dataWithLength:123];
...
CTRunGetGlyphs(run, CGRangeMake(0, 0), someData.mutableBytes);
...
const CGGlyph *glyphs = [someData mutableBytes];
...
...followed by code that reads memory from glyphs but does nothing with someData, which isn't referenced anymore. Note that CGGlyph is not an object type but an unsigned integer.
Do I have to worry that the memory in someData might get freed before I am done with glyphs (which is actually just pointing insidesomeData)?
All this code is WITHIN the same scope (i.e., a single selector), and glyphs and someData both fall out of scope at the same time.
PS In an earlier draft of this question I referred to 'garbage collection', which didn't really apply to my project. That's why some answers below give it equal treatment with what happens under ARC.
You are potentially in trouble whether you use GC or, as others have recommended instead, ARC. What you are dealing with is an internal pointer which is not considered an owning reference in either GC or ARC in general - unless the implementation has special-cased NSData. Without that owning reference either GC or ARC might remove the object. The problem you face is peculiar to internal pointers.
As you describe your situation the safest thing to do is to hang onto the real reference. You could do this by assigning the NSData reference to either an instance variable or a static (method local if you wish) variable and then assigning nil to that variable when you've done with the internal pointer. In the case of static be careful about concurrency!
In practice your code will probably work in both GC and ARC, probably more likely in ARC, but either could conceivably bite you especially as compilers change. For the cost of one variable declaration and one extra assignment you avoid the problem, cheap insurance.
[See this discussion as an example of short lifetime under ARC.]
Under actual, real garbage collection that code is potentially a problem. Objects may be released as soon as there is no longer any reference to them and the compiler may discard the reference at any time if you never use it again. For optimisation purposes scope is just a way of putting an upper limit on that sort of thing, not a way of dictating it absolutely.
You can use NSAllocateCollectable to attach lifecycle calculation to C primitive pointers, though it's messy and slightly convoluted.
Garbage collection was never implemented in iOS and is now deprecated on the Mac (as referenced at the very bottom of this FAQ), in both cases in favour of automatic reference counting (ARC). ARC adds retains and releases where it can see that they're implicitly needed. Sadly it can perform some neat tricks that weren't previously possible, such as retrieving objects from the autorelease pool if they've been used as return results. So that has the same net effect as the garbage collection approach — the object may be released at any point after the final reference to it vanishes.
A workaround would be to create a class like:
#interface PFDoNothing
+ (void)doNothingWith:(id)object;
#end
Which is implemented to do nothing. Post your autoreleased object to it after you've finished using the internal memory. Objective-C's dynamic dispatch means that it isn't safe for the compiler to optimise the call away — it has no way of knowing you (or the KVO mechanisms or whatever other actor) haven't done something like a method swizzle at runtime.
EDIT: NSData being a special case because it offers direct C-level access to object-held memory, it's not difficult to find explicit discussions of the GC situation at least. See this thread on Cocoabuilder for a pretty good one though the same caveat as above applies, i.e. garbage collection is deprecated and automatic reference counting acts differently.
The following is a generic answer that does not necessarily reflect Objective-C GC support. However, various GC implementaitons, including ref-counting, can be thought of in terms of Reachability, quirks aside.
In a GC language, an object is guaranteed to exist as long as it is Strongly-Reachable; the "roots" of these Strong-Reachability graphs can vary by language and executing environment. The exact meaning of "Strongly" also varies, but generally means that the edges are Strong-References. (In a manual ref-counting scenario each edge can be thought of as an unmatched "retain" from a given "owner".)
C# on the CLR/.NET is one such implementation where a variable can remain in scope and yet not function as a "root" for a reachability-graph. See the Systems.Timer.Timer class and look for GC.KeepAlive:
If the timer is declared in a long-running method, use KeepAlive to prevent garbage collection from occurring [on the timer object] before the method ends.
As of summer 2012, things are in the process of change for Apple objects that return inner pointers of non-object type. In the release notes for Mountain Lion, Apple says:
NS_RETURNS_INNER_POINTER
Methods which return pointers (other than Objective C object type)
have been decorated with the clang compiler attribute
objc_returns_inner_pointer (when compiling with clang) to prevent the
compiler from aggressively releasing the receiver expression of those
messages, which no longer appear to be referenced, while the returned
pointer may still be in use.
Inspection of the NSData.h header file shows that this also applies from iOS 6 onward.
Also note that NS_RETURNS_INNER_POINTER is defined as __attribute__((objc_returns_inner_pointer)) in the clang specification, which makes it such that
the object's lifetime will be extended until at least the earliest of:
the last use of the returned pointer, or any pointer derived from it,
in the calling function;
or the autorelease pool is restored to a
previous state.
Caveats:
If you're using anything older then Mountain Lion or iOS 6 you will still need to use any of the methods discussed here (e.g., __attribute__((objc_precise_lifetime))) when declaring your local NSData or NSMutableData objects.
Also, even with the newest compiler and Apple libraries, if you use older or third party libraries with objects that do not decorate their inner-pointer-returning methods with __attribute__((objc_returns_inner_pointer)) you will need to decorate your local variables declarations of such objects with __attribute__((objc_precise_lifetime)) or use one of the other methods discussed in the answers.
I am having problem understanding part of the function of "selectors", as described in Apple's guide. I've bolded the parts where I am getting confused:
In Objective-C, selector has two meanings. It can be used to refer
simply to the name of a method when it’s used in a source-code message
to an object. It also, though, refers to the unique identifier that
replaces the name when the source code is compiled. Compiled
selectors are of type SEL. All methods with the same name have the
same selector. You can use a selector to invoke a method on an
object—this provides the basis for the implementation of the
target-action design pattern in Cocoa.
Methods and Selectors For efficiency, full ASCII names are not used as
method selectors in compiled code. Instead, the compiler writes each
method name into a table, then pairs the name with a unique identifier
that represents the method at runtime. The runtime system makes sure
each identifier is unique: No two selectors are the same, and all
methods with the same name have the same selector.
Can anyone explain these bits? Additionally, if different classes have methods with the same name, will they also have the same selector?
All selectors are uniqued -- both at compile time and, dynamically, at runtime through sel_getUid() or the preferred sel_registerName() (the latter being largely preferred, the former still around for historical reasons) -- for speed.
Back story: to call a method, the runtime needs a selector that identifies what is to be called and an object that it will be called on. This is why every single method call in Objective-C has two parameters: the obvious and well known self and the invisible, implied, parameter _cmd. _cmd is the SEL of the method currently executing. That is, you can paste this code into any method to see the name -- the selector -- of the currently executing method:
NSLog(#"%#", NSStringFromSelector(_cmd));
Note that _cmd is not a global; it really is an argument to your method. See below.
By uniquing the selectors, all selector based operations are implemented using pointer equality tests instead of string processing or any pointer de-referencing at all.
In particular, every single time you make a method call:
[someObject doSomething: toThis withOptions: flags]; // calls SEL doSomething:withOptions:
The compiler generates this code (or a very closely related variant):
objc_msgSend(someObject, #selector(doSomething:withOptions:), toThis, flags);
The very first thing objc_msgSend() does is check to see if someObject is nil and short-circuit if it is (nil-eats-message). The next (ignoring tagged pointers) is to look up the selector in someObjects class (the isa pointer, in fact), find the implementation, and call it (using a tail call optimization).
That find the implementation thing has to be fast and to make it really fast, you want the key to finding the implementation of the method to be as fast and stable as possible. To do that, you want the key to be directly usable and globally unique to the process.
Thus, the selectors are uniqued.
That it also happens to save memory is an fantastic benefit, but the messenger would use more memory than it does today if messenging could be made 2x faster (but not 10x for 2x -- or even 2x memory for 2x speed -- while speed is critical, memory use is also critical, certainly).
If you really want to dive deep on how objc_msgSend() works, I wrote a bit of a guide. Note that it is slightly out of date as it was written before tagged pointers, blocks-as-implementation, and ARC were disclosed. I should update the articles.
Yes. Classes do share selectors.
I can give an example from the source code objc-sel.mm, but when you use sel_registerUid() (used behind the scenes in #selector()),
It copies the input string into an internal buffer (if the string hasn't been registered before), for which all future SELs point to.
This is done for less memory usage, and easier message forwarding.
It also, though, refers to the unique identifier that replaces the name when the source code is compiled... All methods with the same name have the same selector.
For this, I refer to an excellent blog post on selectors:
A selector is the same for all methods that have the same name and parameters — regardless of which objects define them, whether those objects are related in the class hierarchy, or actually have nothing to do with each other. At runtime, Objective-C goes to the class and outright asks it, "Do you respond to this selector?", and calls the resulting function pointer if it does.
The runtime system makes sure each identifier is unique: No two selectors are the same, and all methods with the same name have the same selector.
In a screwed up way, this makes sense. If Method A and Method B have the exact same name and arguments, wouldn't it be more efficient to store their selector as one lookup and query the receiver instead of deciding between two basically equally named selectors at runtime?
Look at the SEL type, you don't have to define which class this selector is from, you just give it a method name, for example:
SEL animationSelector = #selector(addAnimation:forKey:);
You can imagine it as a streetname, for example. Many cities can have the same streetnames, but a streetname without a city is worthless. Same for selectors, you can define a selector without adding the object where it's in. But it's complete worthless without fitting class..
Maybe it's useful if calling a method that MyClass doesn't understand on a something typed MyClass is an error rather than a warning since it's probably either a mistake or going to cause mistakes in the future...
However, why is this error specific to ARC? ARC decides what it needs to retain/release/autorelease based on the cocoa memory management conventions, which would suggest that knowing the selector's name is enough. So it makes sense that there are problems with passing a SEL variable to performSelector:, as it's not known at compile-time whether the selector is an init/copy/new method or not. But why does seeing this in a class interface or not make any difference?
Am I missing something about how ARC works, or are the clang warnings just being a bit inconsistent?
ARC decides what it needs to retain/release/autorelease based on the cocoa memory management conventions, which would suggest that knowing the selector's name is enough.
This is just one way that ARC determines memory management. ARC can also determine memory management via attributes. For example, you can declare any typedef retainable using __attribute__((NSObject)) (never, ever do this, but it's legal). You can also use other attributes like __attribute((ns_returns_retained)) and several others to override naming conventions (these are things you might reasonably do if you couldn't fix the naming; but it's much better to fix the naming).
Now, imagine a case where you failed to include the header file that declares these attributes in some files but not others. Now, some compile units (.m files) memory manage it one way and some memory manage it another. Hijinks ensure. This is much, much worse than the situation without ARC, and the resulting bugs would be mindbending because some ARC code would do one thing and other ARC code would do something different.
So, yeah, don't do that. (Of course you should never ignore warnings in Objective-C anyway, but this is a particularly nasty situation.)
It's an ounce of prevention, I'd assume. Incidentally, it's not foolproof in larger systems because selectors do not need to match and matching is all based on the translation, so it could still blow up on you if you are not writing your program such that it introduces type safety. Something is better than nothing, though!
The compiler wants to know about parameters and return types, potentially annotations and out parameters. ObjC has defaults to fall back on, but it's a good source of multiple types of bugs as the compiler does more for you.
There are a number of reasons you should introduce type safety and turn up the warning levels. With ARC, there are even more. Regardless of whether it is truly necessary, it's a good direction for an objc compiler to move towards (IMHO). You might consider C99 safer than ObjC 2.0 in this regard ;)
If there really is a restriction for codegen, I'd like to hear it.
I'm trying to interface Lua with Objective-C, and I think string conversion with NSSelectorFromString() has too big an overhead because Lua has to copy all strings to internalize them (although I'm not sure about this).
So I'm trying to find more lightweight way to represent a selector in Lua.
An Objective-C selector is an abstracted type, but it's defined as a pointer to something:
typedef struct objc_selector *SEL;
So it looks safe to handle as a regular pointer, so I can pass it to Lua with lightuserdata. Is this fine?
I don't believe it is safe to handle it as a pointer (even a void pointer), because if this ever changes in a future implementation or a different implementation of the language. I didn't see a formal Objective-C spec that tells what is implementation defines, but often when opaque types like this are used it means that you shouldn't have to know details about the underlying type is. In fact, the struct is forward-declared so that you can't access any of its members.
The other problem you might run into is implementing equality comparisons: are selectors references to a pool of constants or is each selector mutable. Once again, implementation defined.
Using C strings as suggested above is probably your best bet; ruby manages to use symbols for selectors and doesn't have too much of a performance penalty. Since the strings are const, lua doesn't need to copy them, but probably does anyway to be safe. If you can find a way to not copy the strings you might not take that much of a performance hit.