What's the difference between a string constant and a string literal? - objective-c

I'm learning objective-C and Cocoa and have come across this statement:
The Cocoa frameworks expect that global string constants rather than string literals are used for dictionary keys, notification and exception names, and some method parameters that take strings.
I've only worked in higher level languages so have never had to consider the details of strings that much. What's the difference between a string constant and string literal?

In Objective-C, the syntax #"foo" is an immutable, literal instance of NSString. It does not make a constant string from a string literal as Mike assume.
Objective-C compilers typically do intern literal strings within compilation units — that is, they coalesce multiple uses of the same literal string — and it's possible for the linker to do additional interning across the compilation units that are directly linked into a single binary. (Since Cocoa distinguishes between mutable and immutable strings, and literal strings are always also immutable, this can be straightforward and safe.)
Constant strings on the other hand are typically declared and defined using syntax like this:
// MyExample.h - declaration, other code references this
extern NSString * const MyExampleNotification;
// MyExample.m - definition, compiled for other code to reference
NSString * const MyExampleNotification = #"MyExampleNotification";
The point of the syntactic exercise here is that you can make uses of the string efficient by ensuring that there's only one instance of that string in use even across multiple frameworks (shared libraries) in the same address space. (The placement of the const keyword matters; it guarantees that the pointer itself is guaranteed to be constant.)
While burning memory isn't as big a deal as it may have been in the days of 25MHz 68030 workstations with 8MB of RAM, comparing strings for equality can take time. Ensuring that most of the time strings that are equal will also be pointer-equal helps.
Say, for example, you want to subscribe to notifications from an object by name. If you use non-constant strings for the names, the NSNotificationCenter posting the notification could wind up doing a lot of byte-by-byte string comparisons when determining who is interested in it. If most of these comparisons are short-circuited because the strings being compared have the same pointer, that can be a big win.

Some definitions
A literal is a value, which is immutable by definition. eg: 10
A constant is a read-only variable or pointer. eg: const int age = 10;
A string literal is a expression like #"". The compiler will replace this with an instance of NSString.
A string constant is a read-only pointer to NSString. eg: NSString *const name = #"John";
Some comments on the last line:
That's a constant pointer, not a constant object1. objc_sendMsg2 doesn't care if you qualify the object with const. If you want an immutable object, you have to code that immutability inside the object3.
All #"" expressions are indeed immutable. They are replaced4 at compile time with instances of NSConstantString, which is a specialized subclass of NSString with a fixed memory layout5. This also explains why NSString is the only object that can be initialized at compile time6.
A constant string would be const NSString* name = #"John"; which is equivalent to NSString const* name= #"John";. Here, both syntax and programmer intention are wrong: const <object> is ignored, and the NSString instance (NSConstantString) was already immutable.
1 The keyword const applies applies to whatever is immediately to its left. If there is nothing to its left, it applies to whatever is immediately to its right.
2 This is the function that the runtime uses to send all messages in Objective-C, and therefore what you can use to change the state of an object.
3 Example: in const NSMutableArray *array = [NSMutableArray new]; [array removeAllObjects]; const doesn't prevent the last statement.
4 The LLVM code that rewrites the expression is RewriteModernObjC::RewriteObjCStringLiteral in RewriteModernObjC.cpp.
5 To see the NSConstantString definition, cmd+click it in Xcode.
6 Creating compile time constants for other classes would be easy but it would require the compiler to use a specialized subclass. This would break compatibility with older Objective-C versions.
Back to your quote
The Cocoa frameworks expect that global string constants rather than
string literals are used for dictionary keys, notification and
exception names, and some method parameters that take strings. You
should always prefer string constants over string literals when you
have a choice. By using string constants, you enlist the help of the
compiler to check your spelling and thus avoid runtime errors.
It says that literals are error prone. But it doesn't say that they are also slower. Compare:
// string literal
[dic objectForKey:#"a"];
// string constant
NSString *const a = #"a";
[dic objectForKey:a];
In the second case I'm using keys with const pointers, so instead [a isEqualToString:b], I can do (a==b). The implementation of isEqualToString: compares the hash and then runs the C function strcmp, so it is slower than comparing the pointers directly. Which is why constant strings are better: they are faster to compare and less prone to errors.
If you also want your constant string to be global, do it like this:
// header
extern NSString *const name;
// implementation
NSString *const name = #"john";

Let's use C++, since my Objective C is totally non-existent.
If you stash a string into a constant variable:
const std::string mystring = "my string";
Now when you call methods, you use my_string, you're using a string constant:
someMethod(mystring);
Or, you can call those methods with the string literal directly:
someMethod("my string");
The reason, presumably, that they encourage you to use string constants is because Objective C doesn't do "interning"; that is, when you use the same string literal in several places, it's actually a different pointer pointing to a separate copy of the string.
For dictionary keys, this makes a huge difference, because if I can see the two pointers are pointing to the same thing, that's much cheaper than having to do a whole string comparison to make sure the strings have equal value.
Edit: Mike, in C# strings are immutable, and literal strings with identical values all end pointing at the same string value. I imagine that's true for other languages as well that have immutable strings. In Ruby, which has mutable strings, they offer a new data-type: symbols ("foo" vs. :foo, where the former is a mutable string, and the latter is an immutable identifier often used for Hash keys).

Related

ObjC ternary operator and const strings

If you have a static const string, then its usage may cause inconsistent interpretation by the compiler. For example in this case:
const NSString* kUntitled = #"Untitled";
NSString* title = kUntitled;
the compiler will complain about assigning a const pointer to a non-const one ("discards qualifiers") and probably rightly so. This can be solved by either not using const at all, or by invoking kUntitled.copy (I somehow don't like the idea of typecasting (NSString*)kUntitled)
However, if you have instead:
NSString* title = aTitle ?: kUntitled;
then the compiler doesn't complain.
My first question is, can the warning be ignored in the first example? Are there any potential dangers in assigning a const NSString to a non-const one?
Second is, why does the compiler ignore the case with the ternary operator?
Welcome to the Weird and Wonderful World of C Declarations - the stuff of quiz questions ;-)
const NSString* kUntitled = #"Untitled";
You probably haven't written what you intended here. This defines kUntitled to be a mutable pointer to a "constant" string - usually referred to as a "pointer to a constant"... However it's "constant" for a reason as despite the common "pointer to a constant" it is actually a "read-only pointer" meaning you can read but not write via the pointer, what is pointed at might well be mutable but it is not mutable via this point if so...
Confused? What the above all means is that you can later write:
kUntitled = #"oops you probably thought you couldn't assign";
As the pointer itself is mutable, it can be changed to point at other things.
What you probably intended was:
NSString * const kUntitled = #"Untitled";
which declares a constant pointer to a string - it is the pointer itself which cannot be changed so:
kUntitled = #"this will produce a compile error, can't change a constant";
If you use this version of the declaration then you won't get an error on your assignments:
NSString* title = kUntitled;
NSString* title = aTitle ?: kUntitled;
However that still leaves the question of why you didn't get an error from the second with your original declaration...
The RHS of the assignment, aTitle ?: kUntitled is actually valid, the weird world of C again. This expression is just shorthand for aTitle ? aTitle : kUntitled and the rules for this operator in C state that the second and third arguments can be of the same base pointer type, NSString * in your case, but differ in qualifiers, const in your case, and the resultant type is the base pointer type with all the qualifiers of the two operands.
In other words the result of this expression is treated as const NSString *. Which means you should get the same warning as for the first assignment.
It appears that the compiler is treating the operator as though the resultant type is the base pointer type with none or only the common qualifiers of the two operands - i.e. the opposite of the definition.
So for the second problem you may have found a compiler bug, you should report it to Apple (bug reporter.apple.com) and see what they say.
HTH
The warning is a side effect of the fact that const NSString * kUntitled is incorrect. This is a declaration of a pointer-to-readonly-NSString. Note the placement of the "read only" there -- it's referring to the string, not the pointer. But ObjC objects are never read only, never const. You may say that literal NSStrings are, of course, but that's implementation dependent, and even modifiable in some runtime environments.
Thus you can never correctly assign this object anywhere else (unless that variable also was a pointer to a const object).
The declaration that you should be using is NSString * const kUntitled -- this is "readonly-pointer-to-NSString", i.e., the pointer cannot be changed to point at another object.

Why are instances created using a 'literal syntax' known as 'literals'?

Something that is bothering me is why the term 'literal' is used to refer to instances of classes like NSString and NSArray. I had only seen the term used in reference to NSString and being naive I thought it had something to do with it 'literally' being a string, that is between quotation markers. Sorry if that sounds pathetic, but that was how I had been thinking about it.
Then today I learned that certain instances of NSArray can also be referred to as literal instances, i.e. an instance of the class created using a 'literal syntax'.
As #Linuxios notes, literal syntaxes are built into the language. They're broader than you think, though. A literal just means that an actual value is encoded in the source. So there are quite a few literal syntaxes in ObjC. For example:
1 - int
1.0 - double
1.0f - float
"a" - C-string
#"a" - NSString
#[] - NSArray
^{} - function
Yeah, blocks are just function literals. They are an anonymous value that is assignable to a symbol name (such as a variable or constant).
Generally speaking, literals can be stored in the text segment and be computed at compile time (rather than at run time). If I remember correctly, array literals are currently expanded into the equivalent code and evaluated at runtime, but #"..." string literals are encoded into the binary as static data (at least now they are; non-Apple versions of gcc used to encode an actual function call to construct static strings as I remember).
A literal syntax or a literal is just an object that was created using a dedicated syntax built into the language instead of using the normal syntax for object creation (whatever that is).
Here I create a literal array:
NSArray* a = #[#"Hello", #"World"];
Which is, for all intents and purposes equivalent to this:
NSArray* a = [NSArray arrayWithObjects:#"Hello", #"World", nil];
The first is called a literal because the #[] syntax is built into the language for creating arrays, in the same way that the #"..." syntax is built in for creating NSStrings.
the term 'literal' is used to refer to instances of classes
It's not referring to the instance really; after the object is created, the way it was created doesn't matter:
NSArray * thisWasCreatedWithALiteral = #[#1, #2];
NSArray * butWhoCares = thisWasCreatedWithALiteral;
The "literal" part is just the special syntax #[#1, #2], and
it ha[s] something to do with it 'literally' being a string, that is between quotation markers.
is exactly right: this is a written-out representation of the array, as opposed to one created with a constructor method like arrayWithObjects:

How to access a character in NSMutableString Objective-C

I have an instance of NSMutableString called MyMutableStr and I want access its character at index 7.
For example:
unsigned char cMy = [(NSString*) MyMutableStr characterAtIndex:7];
I think this is an ugly way; it's too much code.
My question is: Are there more simple ways in Objective-C to access the character in NSMutableString?
Like, in C language we can access a character of a string using [ ] operator:
unsigned char cMy = MyMutableStr[7];
The way of doing it is to use characterAtIndex:, but you don't need to cast it to a NSString pointer, since NSMutableString is a subclass of NSString. So it isn't that long, but if you still don't find it comfortable, I suggest to use UTF8String to obtain a C string over which you can iterate using the brackets operator:
const char* cString= [MyMutableStr UTF8String];
char first= cString[0];
But remember this (taken from NSString class reference):
The returned C string is automatically freed just as a returned object would be released; you should copy the C string if it needs to store it outside of the autorelease context in which the C string is created.
As others said characterAtIndex: but a few things you might want to consider carefully.
First you're dealing with an mutable string. You want to be careful to avoid it changing out from under you. One way is to an immutable copy and use that for the op.
Second, you're dealing with Unicode so you may want to consider normalizing your string to get a precomposed form as some visual representations may be more than one actual unichar. That's often a stumbling block for folks.

objective-c strings: why don't you need a setter/getter?

I'm just beginning, and I'm a little hung up on this. I may have a fundamental misunderstanding with which you can kindly help me out.
Why is it that you can assign a string value to an NSString* (and, I'm sure, many other object types) directly? E.g.,
NSString* s = #"Hello, world!";
whereas the following code, I believe, would assign to s2 s1's pointer value (and therefore only incidentally provide s2 with a string value)?
NSString* s1 = #"Hello, world!";
NSString* s2 = s1;
For many objects, don't you have to indicate a property, a.k.a. instance variable, to which you want to assign a value (i.e., use a setter method)? Shouldn't the object itself accept assignments only of pointer values? Or do classes such as NSString automatically reinterpret code such as the first example above to assign the indicated string to an implied instance variable using an implied setter?
Why is it that you can assign a string value to an NSString* (and, I'm
sure, many other object types) directly?
Though it may look like it, you are not assigning the value of the string 'directly' to the instance variable. You are actually assigning the address of the string value to your instance variable. Now, the real question is what is going on behind the scenes when you have an expression of the type:
NSString * str = #"Hello World";
This expression represents the creation of a string literal. In C (and Objective-C which is a strict superset of C), string literals get special handling. Specifically, the following happens:
When your code is compiled the string "Hello World" will be created in the data section of
the program.
When the program is executing, an instance variable 'str' will be allocated on the heap.
The 'str' instance variable will be pointed at the static memory location where the actual string "Hello World" is stored.
The main difference between your first and second examples is that in the second example the memory for the string variable is dynamically allocated on the heap, at runtime. Note that in both cases the variable 'str' is just a pointer allocated dynamically.
More or less the latter. String literals like #"Hello World!" are treated as a special case in Objective-C: strings declared with that syntax are statically allocated, instantiated and cached at compile time to improve performance. From the programmer's perspective, it's no different from calling [NSString stringWithString:#"Hello World!"] or a constructor that takes a C-string -- you should just think of it as syntactic sugar.
FWIW, Objective-C has recently begun extending the # prefix to allow declaring dictionary and array literals as well, e.g.: #{ #"key" : #"value" } or #[ obj1, obj2, obj3 ].
This is a function of the compiler and not a language construct. The compiler in this case recognizes a string literal and inserts some code to produce the intended result.
#"" is essentially shorthand for NSString's +stringWithUTF8String method.
take from here:
What does the # symbol represent in objective-c?
NSString *s1 = #"Hello, world!";
is essentially equivalent to
NSString *s1 = [NSString stringWithUTF8String:"Hello, world!"];
The former allocates a new NSString object statically (instead of on the heap at runtime, as the latter would do).
It's important to note that these are just pointers. When you do NSString *s2 = s1, both s1 and s2 refer to the same object.

Unfamiliar C syntax in Objective-C context

I am coming to Objective-C from C# without any intermediate knowledge of C. (Yes, yes, I will need to learn C at some point and I fully intend to.) In Apple's Certificate, Key, and Trust Services Programming Guide, there is the following code:
static const UInt8 publicKeyIdentifier[] = "com.apple.sample.publickey\0";
static const UInt8 privateKeyIdentifier[] = "com.apple.sample.privatekey\0";
I have an NSString that I would like to use as an identifier here and for the life of me I can't figure out how to get that into this data structure. Searching through Google has been fruitless also. I looked at the NSString Class Reference and looked at the UTF8String and getCharacters methods but I couldn't get the product into the structure.
What's the simple, easy trick I'm missing?
Those are C strings: Arrays (not NSArrays, but C arrays) of characters. The last character is a NUL, with the numeric value 0.
“UInt8” is the CoreServices name for an unsigned octet, which (on Mac OS X) is the same as an unsigned char.
static means that the array is specific to this file (if it's in file scope) or persists across function calls (if it's inside a method or function body).
const means just what you'd guess: You cannot change the characters in these arrays.
\0 is a NUL, but including it explicitly in a "" literal as shown in those examples is redundant. A "" literal (without the #) is NUL-terminated anyway.
C doesn't specify an encoding. On Mac OS X, it's generally something ASCII-compatible, usually UTF-8.
To convert an NSString to a C-string, use UTF8String or cStringUsingEncoding:. To have the NSString extract the C string into a buffer, use getCString:maxLength:encoding:.
I think some people are missing the point here. Everyone has explained the two constant arrays that are being set up for the tags, but if you want to use an NSString, you can simply add it to the attribute dictionary as-is. You don't have to convert it to anything. For example:
NSString *publicTag = #"com.apple.sample.publickey";
NSString *privateTag = #"com.apple.sample.privatekey";
The rest of the example stays exactly the same. In this case, there is no need for the C string literals at all.
Obtaining a char* (C string) from an NSString isn't the tricky part. (BTW, I'd also suggest UTF8String, it's much simpler.) The Apple-supplied code works because it's assigning a C string literal to the static const array variables. Assigning the result of a function or method call to a const will probably not work.
I recently answered an SO question about defining a constant in Objective-C, which should help your situation. You may have to compromise by getting rid of the const modifier. If it's declared static, you at least know that nobody outside the compilation unit where it's declared can reference it, so just make sure you don't let a reference to it "escape" such that other code could modify it via a pointer, etc.
However, as #Jason points out, you may not even need to convert it to a char* at all. The sample code creates an NSData object for each of these strings. You could just do something like this within the code (replacing steps 1 and 3):
NSData* publicTag = [#"com.apple.sample.publickey" dataUsingEncoding:NSUnicodeStringEncoding];
NSData* privateTag = [#"com.apple.sample.privatekey" dataUsingEncoding:NSUnicodeStringEncoding];
That sure seems easier to me than dealing with the C arrays if you already have an NSString.
try this
NSString *newString = #"This is a test string.";
char *theString;
theString = [newString cStringWithEncoding:[NSString defaultCStringEncoding]];