How to compare two NSString efficiently - objective-c

I know it is possible to use the methods compare: and isEqualToString:, and I suppose isEqualToString is the most efficient method If you know it´s an string. But my question is, is there another way to do it more efficiently? Like comparing char by char or something like that.

By reading the documentation:
The comparison uses the canonical representation of strings, which for a particular string is the length of the string plus the Unicode characters that make up the string. When this method compares two strings, if the individual Unicodes are the same, then the strings are equal, regardless of the backing store. “Literal” when applied to string comparison means that various Unicode decomposition rules are not applied and Unicode characters are individually compared. So, for instance, “Ö” represented as the composed character sequence “O” and umlaut would not compare equal to “Ö” represented as one Unicode character.
and:
When you know both objects are strings, this method is a faster way to check equality than isEqual:.
it seems that it's the best method available, to compare strings and that it does exactly what you need, that is: first it checks for length (if 2 strings have different length, is not necessary to check each char contained), then if the length it's the same it compares each char. Simple and efficient!

isEqualToString: is faster if you know both objects are strings, as the documentation states.

You could try converting both string to C strings and then use strcmp. Doubt it'll actually be any quicker though.
char *str1 = [myNSString1 UTF8String];
char *str2 = [myNSString2 UTF8String];
BOOL isEqual = strcmp(str1,str2);

Related

Store/retrieve in/from variable number of binary values in a hexadecimal string with Objective-C?

I want to take a hexadecimal string and get an NSArray of bit columns (#[#(1), #(3), #(156)]) that are NOT 0 in the binary value of that string.
Similarly, I want to take an NSArray of bit columns indices #[#(23), #(52), #(53), #(129)] and generate a hexadecimal string for those.
Is there a reasonably efficient way of doing this?

How to access a character in NSMutableString Objective-C

I have an instance of NSMutableString called MyMutableStr and I want access its character at index 7.
For example:
unsigned char cMy = [(NSString*) MyMutableStr characterAtIndex:7];
I think this is an ugly way; it's too much code.
My question is: Are there more simple ways in Objective-C to access the character in NSMutableString?
Like, in C language we can access a character of a string using [ ] operator:
unsigned char cMy = MyMutableStr[7];
The way of doing it is to use characterAtIndex:, but you don't need to cast it to a NSString pointer, since NSMutableString is a subclass of NSString. So it isn't that long, but if you still don't find it comfortable, I suggest to use UTF8String to obtain a C string over which you can iterate using the brackets operator:
const char* cString= [MyMutableStr UTF8String];
char first= cString[0];
But remember this (taken from NSString class reference):
The returned C string is automatically freed just as a returned object would be released; you should copy the C string if it needs to store it outside of the autorelease context in which the C string is created.
As others said characterAtIndex: but a few things you might want to consider carefully.
First you're dealing with an mutable string. You want to be careful to avoid it changing out from under you. One way is to an immutable copy and use that for the op.
Second, you're dealing with Unicode so you may want to consider normalizing your string to get a precomposed form as some visual representations may be more than one actual unichar. That's often a stumbling block for folks.

How do I convert a unicode code point range into an NSString character range?

I have an NSString and a unicode code point range that represents a specific section of the text in that NSString. Since the characters in that NSString do not correspond one-to-one with code points, I need to somehow convert my code point range into the corresponding character range. How do I do this?
I know I can use the NSString method -rangeOfComposedCharacterSequencesForRange: to convert a character range to a grapheme cluster range, but what I want to do is sort of the opposite of that, and I can't find an inverse of that method in the APIs. And even if there was such a method available, I don't think this is exactly what I'm looking for, since (if I understand this correctly) a grapheme cluster is not the same thing as a unicode code point, and can in fact be composed of more than one code point.
What you have is kind of mixed data from two different worlds. You might typically get a Unicode code point range along with a UTF-32 string (where the correspondence is one-to-one) so that extracting the substring would be trivial. You have two options:
Work in the UTF-32 world before you put the data into an NSString
Convert the Unicode code point range into a UTF-16 unit range
I assume from your question that #2 is the easiest option in your case.
As you say, characters in an NSString do not correspond one-to-one with Unicode code points since an NSString character is a UTF-16 unit. However, a Unicode code point corresponds to exactly 1 or 2 characters in an NSString. You can fairly easily write your own range conversion routine by iterating through the NSString characters and counting Unicode code points. This is made somewhat easier by the fact that you don't even care about the endianness of the UTF-16 data since valid BMP characters, lead surrogates, and trail surrogates are disjoint. CFString provides some functions to determine what each character is. So in pseudocode you counting would look like:
for each NSString character {
if (CFStringIsSurrogateHighCharacter(character) ||
CFStringIsSurrogateLowCharacter(character))
{
Skip forward another character in the NSString
}
Increment count of Unicode code points stepped through
}

NSString and unichar don't match well when it comes to Unicode

The Apple's documentation states that
A string object is implemented as an array of Unicode characters
However, the size of unichar data type, which is likely to be unsigned short behind the scenes, is only 16 bits, which renders impossible to represent every Unicode character with unichar. How do I reconcile these two facts in my mind?
You are correct that Apple's docs incorrectly refer to Unicode characters when it really means UTF-16 code points.
In the early days of Unicode it was hoped that it would not exceed 16 bits, but it has. Both Apple and Microsoft (and probably others) use 16-bit integers to represent "Unicode characters", even though some characters will have to be represented by surrogate pairs.
Various methods of NSString handle this case (plus combining characters) and return a range for a given character. E.g. -rangeOfCharacterFromSet:... and -rangeOfComposedCharacterSequences....
It's not sure that strings are represented by the unichar data type. "A string object is implemented as an array of Unicode characters" doesn't mean in the source code it is stored as unichar *. You don't know how it is implemented, do you?
And what if unichar is not an unsigned short? What if it is a 32- or 64-bit data type?

Is it better to append a CString than an ObjC String?

I'm writing a bit of code doing string manipulation. In this particular situation, appending "?partnerId=30" to a URL for iTunes Affiliate linking. This is a raw string and completely static. I was thinking, is it better to do:
urlString = [urlString stringByAppendingFormat:#"%#", #"?partnerId=30"];
Or:
urlString = [urlString stringByAppendingFormat:#"%s", "?partnerId=30"];
I would think it's better to not instantiate an entire Objective-C object, but I've never seen it done that way.
String declared using the #"" syntax are constant and will already exist in memory by the time your code is running, thus there is no allocation penalty from using them.
You may find they are very slightly faster, as they know their own length, whereas C strings need to be looped through to find out their length.
Through you'd gain a more tangible performance improvement (though still tiny) from not using a format string:
urlString = [urlString stringByAppendingString:#"?partnerId=30"];
Both literal C strings and literal NSStrings are expressed as constant bits of memory. Neither requires an allocation on use.
Objective-C string literals are immortal objects. They are instantiated when your binary is loaded into memory. With that knowledge, the former form does not create a temporary NSString.
I honestly don't know which is faster in general, because it also depends on external conditions; An NSString may represent strings of multiple encodings (the default is UTF-16), if urlString has an encoding conversion to perform, then it could be a performance hit for either approach. Either way, they will both be quite fast - I wouldn't worry about this case unless you have many (e.g. thousands) of these to create and it is time critical, because their performance should be similar.
Since you are using the form: NSString = NSString+NSString, the NSString literal could be faster for nontrivial cases because the length is stored with the object and the encodings of both strings may already match the destination string. The C string used in your example would also be trivial to convert to another encoding, plus it is short.
C strings, as a more primitive type, could reduce your load times and/or memory usage if you need to define a lot of them.
For simple cases, I'd just stick with NSString literals in this case, unless the problem is much larger than the post would imply.
If you need a C string representation as well for a given set of literals, then you may prefer to define C string literals. Defining C string literals may also force you to create temporary NSStrings based on the C strings. In this case, you may want to define one for each flavor or use CFString's 'create CFString with external buffer' APIs. Again, this would be for very unusual cases (a micro-optimization if you are really not going through huge sets of these strings).