NSString and Objective-C Memory Management ARC - objective-c

I'm a little confused when coding an Objective-C project. The ARC is on. Here is a sample code:
NSString *foo = [[NSString alloc] initWithUTF8String:"This is a C string."];
// Use foo here...
foo = #"This is an ObjC string."
Here are my questions:
Do I need to explicitly terminate C string with '\0' in initWithUTF8String: method, or it is okay to omit NULL terminator?
Is there any memory leakage when I reuse foo as a pointer and assign new Objective-C string to it? Why?
If I change NSString to other class, like NSObject or my own class, is there any difference for question 2? (Initialize an object and then reassign other value directly to it.)
Thank you!

An explicit \0 is not required because in C (and hence Objective C), quoted string literals are null-terminated implicitly by the compiler. Here's a similar question.
Do string literals that end with a null-terminator contain an extra null-terminator?
No memory leakage. The ARC-configured compiler will generate code to release the first string that was being referenced before assigning the new string.
No change. You may get a compile-time warning if the types aren't compatible.

You must have the null terminator. From the documentation: "bytes - A NULL-terminated C array of bytes in UTF-8 encoding. This value must not be NULL."
No. The compiler will insert implicit release of the previous value and retain the new one since you declared foo with (implicit) strong semantics. From the documentation: "__strong is the default. An object remains “alive” as long as there is a strong pointer to it."
In general, no.

For the first question, what Apple's official documentation tells me is:
Returns a string created by copying the data from a given C array of UTF8-encoded bytes.
(id)stringWithUTF8String:(const char *)bytes
Parameters
bytes
A NULL-terminated C array of bytes in UTF8 encoding.
But since string literal is NULL terminated by default (as #TomSwift points out), it's okay to omit it.

Related

How to access a character in NSMutableString Objective-C

I have an instance of NSMutableString called MyMutableStr and I want access its character at index 7.
For example:
unsigned char cMy = [(NSString*) MyMutableStr characterAtIndex:7];
I think this is an ugly way; it's too much code.
My question is: Are there more simple ways in Objective-C to access the character in NSMutableString?
Like, in C language we can access a character of a string using [ ] operator:
unsigned char cMy = MyMutableStr[7];
The way of doing it is to use characterAtIndex:, but you don't need to cast it to a NSString pointer, since NSMutableString is a subclass of NSString. So it isn't that long, but if you still don't find it comfortable, I suggest to use UTF8String to obtain a C string over which you can iterate using the brackets operator:
const char* cString= [MyMutableStr UTF8String];
char first= cString[0];
But remember this (taken from NSString class reference):
The returned C string is automatically freed just as a returned object would be released; you should copy the C string if it needs to store it outside of the autorelease context in which the C string is created.
As others said characterAtIndex: but a few things you might want to consider carefully.
First you're dealing with an mutable string. You want to be careful to avoid it changing out from under you. One way is to an immutable copy and use that for the op.
Second, you're dealing with Unicode so you may want to consider normalizing your string to get a precomposed form as some visual representations may be more than one actual unichar. That's often a stumbling block for folks.

objective-c strings: why don't you need a setter/getter?

I'm just beginning, and I'm a little hung up on this. I may have a fundamental misunderstanding with which you can kindly help me out.
Why is it that you can assign a string value to an NSString* (and, I'm sure, many other object types) directly? E.g.,
NSString* s = #"Hello, world!";
whereas the following code, I believe, would assign to s2 s1's pointer value (and therefore only incidentally provide s2 with a string value)?
NSString* s1 = #"Hello, world!";
NSString* s2 = s1;
For many objects, don't you have to indicate a property, a.k.a. instance variable, to which you want to assign a value (i.e., use a setter method)? Shouldn't the object itself accept assignments only of pointer values? Or do classes such as NSString automatically reinterpret code such as the first example above to assign the indicated string to an implied instance variable using an implied setter?
Why is it that you can assign a string value to an NSString* (and, I'm
sure, many other object types) directly?
Though it may look like it, you are not assigning the value of the string 'directly' to the instance variable. You are actually assigning the address of the string value to your instance variable. Now, the real question is what is going on behind the scenes when you have an expression of the type:
NSString * str = #"Hello World";
This expression represents the creation of a string literal. In C (and Objective-C which is a strict superset of C), string literals get special handling. Specifically, the following happens:
When your code is compiled the string "Hello World" will be created in the data section of
the program.
When the program is executing, an instance variable 'str' will be allocated on the heap.
The 'str' instance variable will be pointed at the static memory location where the actual string "Hello World" is stored.
The main difference between your first and second examples is that in the second example the memory for the string variable is dynamically allocated on the heap, at runtime. Note that in both cases the variable 'str' is just a pointer allocated dynamically.
More or less the latter. String literals like #"Hello World!" are treated as a special case in Objective-C: strings declared with that syntax are statically allocated, instantiated and cached at compile time to improve performance. From the programmer's perspective, it's no different from calling [NSString stringWithString:#"Hello World!"] or a constructor that takes a C-string -- you should just think of it as syntactic sugar.
FWIW, Objective-C has recently begun extending the # prefix to allow declaring dictionary and array literals as well, e.g.: #{ #"key" : #"value" } or #[ obj1, obj2, obj3 ].
This is a function of the compiler and not a language construct. The compiler in this case recognizes a string literal and inserts some code to produce the intended result.
#"" is essentially shorthand for NSString's +stringWithUTF8String method.
take from here:
What does the # symbol represent in objective-c?
NSString *s1 = #"Hello, world!";
is essentially equivalent to
NSString *s1 = [NSString stringWithUTF8String:"Hello, world!"];
The former allocates a new NSString object statically (instead of on the heap at runtime, as the latter would do).
It's important to note that these are just pointers. When you do NSString *s2 = s1, both s1 and s2 refer to the same object.

Do I need to release a constant NSString?

I'm reading memory management rules to this point where it said
- (void)printHello {
NSString *string;
string = [[NSString alloc] initWithString:#"Hello"];
NSLog(#"%#", string);
[string release];
}
you have ownership and have to release string, but I'm curious about the #"Hello". #" " is the syntax for creating and NSString, and it's an object. So doesn't that get leaked?
#"…" is a literal instance of NSString. When the compiler sees a literal string, it maps the string into the binary file (e.g. your program) and the string is available as an NSString object when the binary is loaded (e.g. when you run your program). You don’t have to manage the memory occupied by literal strings because they’re an intrinsic part of your binary — they are always available, they never get released, and you don’t have to worry about managing their memory.
Bavarious's answer is correct. For the curious, I can add that this is documented in Apple's “String Programming Guide”, specifically the section “Creating Strings” where it says (emphasis mine):
The simplest way to create a string object in source code is to use the Objective-C #"..." construct:
NSString *temp = #"/tmp/scratch";
Note that, when creating a string constant in this fashion, you should use UTF-8 characters. Such an object is created at compile time and exists throughout your program’s execution. The compiler makes such object constants unique on a per-module basis, and they’re never deallocated.

What does assigning a literal string to an NSString with "=" actually do?

What does the following line actually do?
string = #"Some text";
Assuming that "string" is declared thusly in the header:
NSString *string;
What does the "=" actually do here? What does it do to "string"'s reference count? In particular, assuming that for some reason "string" is not otherwise assigned to, does it need to be released?
Thanks!
The assignment is just that. The string pointer is basically a label that points to specific address in memory. Reassignment statement would point that label to another address in memory!
It doesn't change reference counting or do anything beyond that in Objective-C. You need to maintain the reference count yourself, if you are running in a non-garbage-collection environment:
[string release];
string = [#"Some text" retain];
However, string literals don't need to be managed, as they get allocated statically and never get deallocated! So the release and retain methods are just NOOPs (i.e. no operations). You can safely omit them.
What does the following line actually do?
string = #"Some text";
Assuming that "string" is declared thusly in the header:
NSString *string;
What does the "=" actually do here? What does it do to "string"'s reference count?
string is not a string.
string is, in fact, not any other kind of Cocoa object, either.
string is a variable, which you've created to hold an instance of NSString. The assignment operator puts something into a variable*. In your example above, you create a literal string, and put that into the variable.
Since string is a variable, not a Cocoa object, it has no reference count.
Assigning an object somewhere can extend the object's lifetime in garbage-collected code (only on the Mac). See the Memory Management Programming Guide for Cocoa for more details.
*Or a C array. Don't confuse these with Cocoa arrays; they're not interchangeable, and you can't use the assignment operator to put things into a Cocoa collection (not in Objective-C, anyway).
When you use a literal like in this case, it is just syntactic sugar to quickly create an NSString object. Once created, the object behaves just like another other. The difference here is that your string is compiled into the program instead of created dynamically.

What's the difference between a string constant and a string literal?

I'm learning objective-C and Cocoa and have come across this statement:
The Cocoa frameworks expect that global string constants rather than string literals are used for dictionary keys, notification and exception names, and some method parameters that take strings.
I've only worked in higher level languages so have never had to consider the details of strings that much. What's the difference between a string constant and string literal?
In Objective-C, the syntax #"foo" is an immutable, literal instance of NSString. It does not make a constant string from a string literal as Mike assume.
Objective-C compilers typically do intern literal strings within compilation units — that is, they coalesce multiple uses of the same literal string — and it's possible for the linker to do additional interning across the compilation units that are directly linked into a single binary. (Since Cocoa distinguishes between mutable and immutable strings, and literal strings are always also immutable, this can be straightforward and safe.)
Constant strings on the other hand are typically declared and defined using syntax like this:
// MyExample.h - declaration, other code references this
extern NSString * const MyExampleNotification;
// MyExample.m - definition, compiled for other code to reference
NSString * const MyExampleNotification = #"MyExampleNotification";
The point of the syntactic exercise here is that you can make uses of the string efficient by ensuring that there's only one instance of that string in use even across multiple frameworks (shared libraries) in the same address space. (The placement of the const keyword matters; it guarantees that the pointer itself is guaranteed to be constant.)
While burning memory isn't as big a deal as it may have been in the days of 25MHz 68030 workstations with 8MB of RAM, comparing strings for equality can take time. Ensuring that most of the time strings that are equal will also be pointer-equal helps.
Say, for example, you want to subscribe to notifications from an object by name. If you use non-constant strings for the names, the NSNotificationCenter posting the notification could wind up doing a lot of byte-by-byte string comparisons when determining who is interested in it. If most of these comparisons are short-circuited because the strings being compared have the same pointer, that can be a big win.
Some definitions
A literal is a value, which is immutable by definition. eg: 10
A constant is a read-only variable or pointer. eg: const int age = 10;
A string literal is a expression like #"". The compiler will replace this with an instance of NSString.
A string constant is a read-only pointer to NSString. eg: NSString *const name = #"John";
Some comments on the last line:
That's a constant pointer, not a constant object1. objc_sendMsg2 doesn't care if you qualify the object with const. If you want an immutable object, you have to code that immutability inside the object3.
All #"" expressions are indeed immutable. They are replaced4 at compile time with instances of NSConstantString, which is a specialized subclass of NSString with a fixed memory layout5. This also explains why NSString is the only object that can be initialized at compile time6.
A constant string would be const NSString* name = #"John"; which is equivalent to NSString const* name= #"John";. Here, both syntax and programmer intention are wrong: const <object> is ignored, and the NSString instance (NSConstantString) was already immutable.
1 The keyword const applies applies to whatever is immediately to its left. If there is nothing to its left, it applies to whatever is immediately to its right.
2 This is the function that the runtime uses to send all messages in Objective-C, and therefore what you can use to change the state of an object.
3 Example: in const NSMutableArray *array = [NSMutableArray new]; [array removeAllObjects]; const doesn't prevent the last statement.
4 The LLVM code that rewrites the expression is RewriteModernObjC::RewriteObjCStringLiteral in RewriteModernObjC.cpp.
5 To see the NSConstantString definition, cmd+click it in Xcode.
6 Creating compile time constants for other classes would be easy but it would require the compiler to use a specialized subclass. This would break compatibility with older Objective-C versions.
Back to your quote
The Cocoa frameworks expect that global string constants rather than
string literals are used for dictionary keys, notification and
exception names, and some method parameters that take strings. You
should always prefer string constants over string literals when you
have a choice. By using string constants, you enlist the help of the
compiler to check your spelling and thus avoid runtime errors.
It says that literals are error prone. But it doesn't say that they are also slower. Compare:
// string literal
[dic objectForKey:#"a"];
// string constant
NSString *const a = #"a";
[dic objectForKey:a];
In the second case I'm using keys with const pointers, so instead [a isEqualToString:b], I can do (a==b). The implementation of isEqualToString: compares the hash and then runs the C function strcmp, so it is slower than comparing the pointers directly. Which is why constant strings are better: they are faster to compare and less prone to errors.
If you also want your constant string to be global, do it like this:
// header
extern NSString *const name;
// implementation
NSString *const name = #"john";
Let's use C++, since my Objective C is totally non-existent.
If you stash a string into a constant variable:
const std::string mystring = "my string";
Now when you call methods, you use my_string, you're using a string constant:
someMethod(mystring);
Or, you can call those methods with the string literal directly:
someMethod("my string");
The reason, presumably, that they encourage you to use string constants is because Objective C doesn't do "interning"; that is, when you use the same string literal in several places, it's actually a different pointer pointing to a separate copy of the string.
For dictionary keys, this makes a huge difference, because if I can see the two pointers are pointing to the same thing, that's much cheaper than having to do a whole string comparison to make sure the strings have equal value.
Edit: Mike, in C# strings are immutable, and literal strings with identical values all end pointing at the same string value. I imagine that's true for other languages as well that have immutable strings. In Ruby, which has mutable strings, they offer a new data-type: symbols ("foo" vs. :foo, where the former is a mutable string, and the latter is an immutable identifier often used for Hash keys).