Is it better to append a CString than an ObjC String? - objective-c

I'm writing a bit of code doing string manipulation. In this particular situation, appending "?partnerId=30" to a URL for iTunes Affiliate linking. This is a raw string and completely static. I was thinking, is it better to do:
urlString = [urlString stringByAppendingFormat:#"%#", #"?partnerId=30"];
Or:
urlString = [urlString stringByAppendingFormat:#"%s", "?partnerId=30"];
I would think it's better to not instantiate an entire Objective-C object, but I've never seen it done that way.

String declared using the #"" syntax are constant and will already exist in memory by the time your code is running, thus there is no allocation penalty from using them.
You may find they are very slightly faster, as they know their own length, whereas C strings need to be looped through to find out their length.
Through you'd gain a more tangible performance improvement (though still tiny) from not using a format string:
urlString = [urlString stringByAppendingString:#"?partnerId=30"];

Both literal C strings and literal NSStrings are expressed as constant bits of memory. Neither requires an allocation on use.

Objective-C string literals are immortal objects. They are instantiated when your binary is loaded into memory. With that knowledge, the former form does not create a temporary NSString.
I honestly don't know which is faster in general, because it also depends on external conditions; An NSString may represent strings of multiple encodings (the default is UTF-16), if urlString has an encoding conversion to perform, then it could be a performance hit for either approach. Either way, they will both be quite fast - I wouldn't worry about this case unless you have many (e.g. thousands) of these to create and it is time critical, because their performance should be similar.
Since you are using the form: NSString = NSString+NSString, the NSString literal could be faster for nontrivial cases because the length is stored with the object and the encodings of both strings may already match the destination string. The C string used in your example would also be trivial to convert to another encoding, plus it is short.
C strings, as a more primitive type, could reduce your load times and/or memory usage if you need to define a lot of them.
For simple cases, I'd just stick with NSString literals in this case, unless the problem is much larger than the post would imply.
If you need a C string representation as well for a given set of literals, then you may prefer to define C string literals. Defining C string literals may also force you to create temporary NSStrings based on the C strings. In this case, you may want to define one for each flavor or use CFString's 'create CFString with external buffer' APIs. Again, this would be for very unusual cases (a micro-optimization if you are really not going through huge sets of these strings).

Related

Can I use memchr safely with an internal UTF-8 char * returned from an NSString?

I'd like to use memchr instead of strlen to find the length of a C string potentially used as the backing string of an NSString. Is this safe to do, or do I risk reading memory that I don't own, causing a crash? Let's assume that the NSString will not be released before I'm done with the internal buffer.
memchr(s, 0, XXX) and strlen(s) should pretty much behave identically, save for mechr()'s ability to terminate after XXX bytes. But strnlen() can do that, too.
And that behavior is probably exactly what you don't want.
Neither function accounts for any kind of unicode encoding. Thus, the returned length will be the length-in-bytes and not the # of characters.
Use -length on the NSString if you want the string length. Beyond that, what are you trying to do?

Objective-C Declaring Variables Within Loop -- Performance

I've been teaching myself Objective-C via some lectures from iTunes U.
I like the course, but the professor routinely writes code like this:
while (index < [self.textToAnalyze length]) {
NSRange range;
id value = [self.textToAnalyze attribute:attributeName atIndex:index effectiveRange:&range];
....
}
I have mostly worked in C and C++, and there are a couple of things about this code that seem glaringly wrong to me from a performance / style perspective.
The code declares two variables (of type NSRange and id) inside the loop. A naive compiler would reserve space for each of these types of variables at each iteration through the loop.
The code calls the length selector ([self.textToAnalyze length]) in the loop condition. As the programmer, I know that the length of the textToAnalyze property does not change while the loop is iterating. However, I'm not convinced that the compiler will know this. Won't this code call the length selector every single iteration of the loop?
I know that compilers can be very crafty, but I think it is bad code to declare variables within a loop and call functions within the loop conditions. It could affect the performance, and in my mind, it is certainly poor style.
However, I am new to Objective-C, so here is my question:
Is this bad code, or are these Objective-C idioms that are fine to use?
The answer is: it depends.
For variables in loops, moving them out of the loop might improve performance of the loop, but now you've changed the scope and effected performance in another way. Which is better is definitely going to be situation-based.
But, as far as I'm concerned, a variable with a scope any wider than necessary is bad practice. Variables with unnecessarily wide scope and lead to user error. The amount of performance improvement you might get here is not worth ANY amount of debugging that could've been avoided with a narrower variable scope.
As for calling methods in loop conditions, again, this depends.
I'm quite certain that the length method of NSString is quite optimized. The length of a string is not analyzed every time length is called, and especially not so for NSString (as opposed to NSMutableString. If we're talking about an immutable string, calling length is simply returning an NSInteger value stored within the class and performance will be fine.
If, however, we're talking about an immutable string, we need to call length every time if it's important that we're only within the length of the string.
It's a mutable string. Even if this loop doesn't modify the string, it can be modified else where by anyone that has a reference to it. If you're concerned about performance, make an immutable copy of the string and use that immutable copy in the loop.
[myString length]
According to this answer, for both mutable and immutable versions, there is a variable within the class that stores the string's length. For immutable strings, this is calculated when the string is created and never changed. For mutable strings, this variable is calculated and set each time the string changes. At the end of the day, when you call string, it's purely returning the value of an internally stored int value in the class and will not be any slower than storing this length in a separate variable before the loop and comparing against this.
I think it is theoretically possible for some loop conditions to be cached, but in practice that doesn't seem to happen — the length method will be called every time through the loop.
Does that make this bad code? Not necessarily. The overhead of a function call is not that huge. If this loop needs to be tight, then yes, caching the string's length is a great idea. But in many cases, it just doesn't make any measurable difference, so less code is better code.
As for declaring variables in a loop, that isn't problematic. There are two mainstream Objective-C compilers in existence, and neither has any trouble compiling that to efficient code. I doubt even POC has trouble with that. Muddling your variables' scope just to please a hypothetical awful compiler isn't good code — it's premature optimization. (Incidentally, the same holds true for C++ too.)

Objective-C String Differences

What's the difference between NSString *myString = #"Johnny Appleseed" versus NSString *myString = [NSString stringWithString: #"Johnny Appleseed"]?
Where's a good case to use either one?
The other answers here are correct. A case where you would use +stringWithString: is to obtain an immutable copy of a string which might be mutable.
In the first case, you are getting a pointer to a constant NSString. As long as your program runs, myString will be a valid pointer. In the second, you are creating an autoreleased NSString object with a constant string as a template. In that case, myString won't point to a real object anymore after the current run loop ends.
Edit: As many people have noted, the normal implementation of stringWithString: just returns a pointer to the constant string, so under normal circumstances, your two examples are exactly the same. There is a bit of a subtle difference in that Objective-C allows methods of a class to be replaced using categories and allows whole classes to be replaced with class_poseAs. In those cases, you might run into a non-default implementation of stringWithString:, which may have different semantics than you expect it to. Just because it happens to be that the default implementation does the same thing as a simple assignment doesn't mean that you should rely on subtle implementation-specific behaviour in your program - use the right case for the particular job you're trying to do.
Other than syntax and a very very minor difference in performance, nothing. The both produce the exact same pointer to the exact same object.
Use the first example. It's easier to read.
In practice, nothing. You wouldn't ever use the second form, really, unless you had some special reason to. And I can't think of any right now.
(See Carl's answer for the technical difference.)

How does a NSMutableString exactly work?

I know all instances of NSString are inmutable. If you assign a new value to a string new memory is addressed and the old string will be lost.
But if you use NSMutableString the string will always keep his same address in memory, no matter what you do.
I wonder how this exactly works. With methods like replaceCharactersInRange I can even add more characters to a string so I need more memory for my string. What happens to the objects in memory that follow the string? Are they all relocated and put somewhere else in memory? I don't think so. But what is really going on?
I know all instances of NSString are
inmutable. If you assign a new value
to a string new memory is addressed
and the old string will be lost.
That isn't how mutability works, nor how references to NSStrings work. Nor how pointers work.
A pointer to an object -- NSString *a; declares a variable a that is a pointer to an object -- merely holds the address in memory of the object. The actual object is [generally] an allocation on the heap of memory that contains the actual object itself.
In those terms, there is really no difference at runtime between:
NSString *a;
NSMutableString *b;
Both are references to -- addresses of -- some allocation in memory. The only difference is during compile time, b will be treated differently than a and the compiler will not complain if, say, you use NSMutableString methods when calling b (but would when calling a).
As far as how NSMutableString works, it contains a buffer (or several buffers -- implementation detail) internally that contain the string data. When you call one of the methods that mutate the string's contents, the mutable string will re-allocate its internal storage as necessary to contain the new data.
Objects do not move in memory. Once allocated, an allocation will never move -- the address of the object or allocation will never change. The only semi-exception is when you use something like realloc() which might return a different address. However, that is really just a sequence of free(); malloc(); memcpy();.
I'd suggest you revisit the Objective-C Programming Guide or, possibly, a C programming manual.
the NSMutableString works just like the C++ std::string do. i don't know exactly how they work, but there are two popular approaches:
concating
you create a struct with two variables. one char and one pointer.
when a new char(or more are added) you create a new instance of the struct, and add the address to the last struct instance of the string. this way you have a bunch of structs with a pointer directing to the next struct.
copy & add
the way most newbies will go. not the worst, but maybe the slowest.
you save a "normal" unmutable string. if a new char is added, you allocate a area in the memory with the size of the old string +1, copy the old string and concate the new char. that's a very simple approach, isn't it?
a bit more advanced version would be to create the new string with a size +50, and just add the chars and a new null at the end, don't forget the to overwrite the old null. this way it's more efficient for string with a lot of changes.
as i said before, i don't know how std::string or NSMutableString approaches this issue. but these are the most common ways.

What's the difference between a string constant and a string literal?

I'm learning objective-C and Cocoa and have come across this statement:
The Cocoa frameworks expect that global string constants rather than string literals are used for dictionary keys, notification and exception names, and some method parameters that take strings.
I've only worked in higher level languages so have never had to consider the details of strings that much. What's the difference between a string constant and string literal?
In Objective-C, the syntax #"foo" is an immutable, literal instance of NSString. It does not make a constant string from a string literal as Mike assume.
Objective-C compilers typically do intern literal strings within compilation units — that is, they coalesce multiple uses of the same literal string — and it's possible for the linker to do additional interning across the compilation units that are directly linked into a single binary. (Since Cocoa distinguishes between mutable and immutable strings, and literal strings are always also immutable, this can be straightforward and safe.)
Constant strings on the other hand are typically declared and defined using syntax like this:
// MyExample.h - declaration, other code references this
extern NSString * const MyExampleNotification;
// MyExample.m - definition, compiled for other code to reference
NSString * const MyExampleNotification = #"MyExampleNotification";
The point of the syntactic exercise here is that you can make uses of the string efficient by ensuring that there's only one instance of that string in use even across multiple frameworks (shared libraries) in the same address space. (The placement of the const keyword matters; it guarantees that the pointer itself is guaranteed to be constant.)
While burning memory isn't as big a deal as it may have been in the days of 25MHz 68030 workstations with 8MB of RAM, comparing strings for equality can take time. Ensuring that most of the time strings that are equal will also be pointer-equal helps.
Say, for example, you want to subscribe to notifications from an object by name. If you use non-constant strings for the names, the NSNotificationCenter posting the notification could wind up doing a lot of byte-by-byte string comparisons when determining who is interested in it. If most of these comparisons are short-circuited because the strings being compared have the same pointer, that can be a big win.
Some definitions
A literal is a value, which is immutable by definition. eg: 10
A constant is a read-only variable or pointer. eg: const int age = 10;
A string literal is a expression like #"". The compiler will replace this with an instance of NSString.
A string constant is a read-only pointer to NSString. eg: NSString *const name = #"John";
Some comments on the last line:
That's a constant pointer, not a constant object1. objc_sendMsg2 doesn't care if you qualify the object with const. If you want an immutable object, you have to code that immutability inside the object3.
All #"" expressions are indeed immutable. They are replaced4 at compile time with instances of NSConstantString, which is a specialized subclass of NSString with a fixed memory layout5. This also explains why NSString is the only object that can be initialized at compile time6.
A constant string would be const NSString* name = #"John"; which is equivalent to NSString const* name= #"John";. Here, both syntax and programmer intention are wrong: const <object> is ignored, and the NSString instance (NSConstantString) was already immutable.
1 The keyword const applies applies to whatever is immediately to its left. If there is nothing to its left, it applies to whatever is immediately to its right.
2 This is the function that the runtime uses to send all messages in Objective-C, and therefore what you can use to change the state of an object.
3 Example: in const NSMutableArray *array = [NSMutableArray new]; [array removeAllObjects]; const doesn't prevent the last statement.
4 The LLVM code that rewrites the expression is RewriteModernObjC::RewriteObjCStringLiteral in RewriteModernObjC.cpp.
5 To see the NSConstantString definition, cmd+click it in Xcode.
6 Creating compile time constants for other classes would be easy but it would require the compiler to use a specialized subclass. This would break compatibility with older Objective-C versions.
Back to your quote
The Cocoa frameworks expect that global string constants rather than
string literals are used for dictionary keys, notification and
exception names, and some method parameters that take strings. You
should always prefer string constants over string literals when you
have a choice. By using string constants, you enlist the help of the
compiler to check your spelling and thus avoid runtime errors.
It says that literals are error prone. But it doesn't say that they are also slower. Compare:
// string literal
[dic objectForKey:#"a"];
// string constant
NSString *const a = #"a";
[dic objectForKey:a];
In the second case I'm using keys with const pointers, so instead [a isEqualToString:b], I can do (a==b). The implementation of isEqualToString: compares the hash and then runs the C function strcmp, so it is slower than comparing the pointers directly. Which is why constant strings are better: they are faster to compare and less prone to errors.
If you also want your constant string to be global, do it like this:
// header
extern NSString *const name;
// implementation
NSString *const name = #"john";
Let's use C++, since my Objective C is totally non-existent.
If you stash a string into a constant variable:
const std::string mystring = "my string";
Now when you call methods, you use my_string, you're using a string constant:
someMethod(mystring);
Or, you can call those methods with the string literal directly:
someMethod("my string");
The reason, presumably, that they encourage you to use string constants is because Objective C doesn't do "interning"; that is, when you use the same string literal in several places, it's actually a different pointer pointing to a separate copy of the string.
For dictionary keys, this makes a huge difference, because if I can see the two pointers are pointing to the same thing, that's much cheaper than having to do a whole string comparison to make sure the strings have equal value.
Edit: Mike, in C# strings are immutable, and literal strings with identical values all end pointing at the same string value. I imagine that's true for other languages as well that have immutable strings. In Ruby, which has mutable strings, they offer a new data-type: symbols ("foo" vs. :foo, where the former is a mutable string, and the latter is an immutable identifier often used for Hash keys).