Xcode - EXEC_BAD_ACCESS when concatenting a large string - objective-c

I'm getting a EXEC_BAD_ACCESS when concatenting a large string.
I've read from a feed and to create my webview I build up my string like:
NSString *pageData = #"<h1>header</h1>";
pageData = [pageData stringByAppendingFormat#"<p>"];
pageData = [pageData stringByAppendingFormat#"self.bodyText"];
pageData = [pageData stringByAppendingFormat#"</p>"];
etc
The problem I've got is self.bodytext is 21,089 characters with spaces when I do a count on word.
Is there a better method for doing this?
Thanks

You would definitely want to use NSMutableString for something like this:
NSMutableString * pageData = [NSMutableString stringWithCapacity:0];
[pageData appendFormat:#"<h1>header</h1>"];
[pageData appendFormat:#"<p>"];
...
NSMutableString is designed for this kind of sequential concatenation, where the basic NSString class is really not meant to be used in this manner. Your original code would actually allocate a new NSString every time you called stringByAppendFormat:, and then procede to copy into it all of the thousands of characters you had already appended. This could easily result in an out of memory error, since the size of the temporary strings would be growing exponentially as you add more and more calls.
Using NSMutableString will not re-copy all of the string data when you call appendFormat:, since the mutable string maintains an internal buffer and simply tacks new strings on to the end of it. Depending on the size of your string, you may want to reserve a huge chunk of memory ahead of time (use a meaningful number for the ...WithCapacity: argument). But there is no need to go that route unless you actually run into performance issues.

There are a few problems with your sample code:
You should be using a NSMutableString to build up an output string by appending multiple parts. NSString is an immutable class which means that each time you call stringByAppendingFormat: you are incurring the overhead of creating an additional new NSString object which will need to be collected and released by the autorelease pool.
NSMutableString * pageData = [NSMutableString stringWithCapacity:0];
You should use appendString: on your NSMutableString to append content, instead of stringByAppendingFormat: or appendFormat:. The format methods are intended for creating new strings based on a format specifier which includes special fields as placeholders. See Formatting String Objects for more details. When you're using stringByAppendingFormat: with just a literal string like your code has, you are incurring the overhead of parsing the string for the non-existant placeholders, and more importantly, if the string happens to have a placeholder (or something that looks like one) in it, you'll end up with the EXEC_BAD_ACCESS crash that you are getting. Most likely this happening when your bodyText is appended. Thus if you simply want to append a '' to your NSMutableString do something like this:
[pageData appendString:#"<p>"];
If you want to append the contents of the self.bodyText property to the string, you shouldn't put the name of the property inside of a string literal (i.e. #"self.bodyText" is the literal string "self.bodyText", not the contents of the property. Try:
[pageData appendString:self.bodyText];
As an example, you could actually combine all three lines of your sample code by using a format specification:
pageData = [pageData stringByAppendingFormat:#"<p>%#</p>", self.bodyText];
In the format specification %# is a placeholder that means insert the result of sending the description or descriptionWithLocale: message to the object. For an NSString this is simply the contents of the string.

I doubt the length of the string is really a problem. A 50,000-character string is only about 100 KB. But you want to be very careful about using format strings. If your string contains something that looks like a formatting specifier, there had better be a corresponding argument or you'll get garbage if you're lucky and a crash if you're not. I suspect this is the error, since there is no other obvious problem from your description. Be careful about what you put in there, and avoid ever putting dynamic text in a format string — just put a %# in the format string and pass the dynamic text as an argument.

Use appendString: instead of appendFormat: when dealing with arbitrary strings.
pageData = [pageData stringByAppendingString:#"<p>"];
pageData = [pageData stringByAppendingString:#"self.bodyText"];
pageData = [pageData stringByAppendingString:#"</p>"];
or do not use an arbitrary string as the format:
pageData = [pageData stringByAppendingFormat:#"<p>%#</p>" , #"self.bodyText"];
If you are building the string up in pieces, use NSMutableString instead of several stringBy calls.
Remember that % is a special character for formatted strings and for url escapes, so if bodyText contains a url it could easily cause a crash.

Related

How to discover if a c-string can be encoded to NSString with a given encoding

I am trying to implement code that converts const char * to NSString. I would like to try multiple encodings in a specified order until I find one that works. Unfortunately, all the initWith... methods on NSString say that the results are undefined if the encoding doesn't work.
In particular, (sometimes) I would like to try first to encode as NSMacOSRomanStringEncoding which never seems to fail. Instead it just encodes gobbledygook. Is there some kind of check I can perform ahead of time? (Like canBeConvertedToEncoding but in the other direction?)
Instead of trying encodings one by one until you find a match, consider asking NSString to help you out here by using +[NSString stringEncodingForData:encodingOptions:convertedString:usedLossyConversion:], which, given string data and some options, may be able to detect the encoding for you, and return it (along with the actual decoded string).
Specifically for your use-case, since you have a list of encodings you'd like to try, the encodingOptions parameter will allow you to pass those encodings in using the NSStringEncodingDetectionSuggestedEncodingsKey.
So, given a C string and some possible encoding options, you might be able to do something like:
NSString *decodeCString(const char *source, NSArray<NSNumber *> *encodings) {
NSData * const cStringData = [NSData dataWithBytesNoCopy:(void *)source length:strlen(source) freeWhenDone:NO];
NSString *result = nil;
BOOL usedLossyConversion = NO;
NSStringEncoding determinedEncoding = [NSString stringEncodingForData:cStringData
encodingOptions:#{NSStringEncodingDetectionSuggestedEncodingsKey: encodings,
NSStringEncodingDetectionUseOnlySuggestedEncodingsKey: #YES}
convertedString:&result
usedLossyConversion:&usedLossyConversion];
/* Decide whether to do anything with `usedLossyConversion` and `determinedEncoding. */
return result;
}
Example usage:
NSString *result = decodeCString("Hello, world!", #[#(NSShiftJISStringEncoding), #(NSMacOSRomanStringEncoding), #(NSASCIIStringEncoding)]);
NSLog(#"%#", result); // => "Hello, world!"
If you don't 100% care about using only the list of encodings you want to try, you can drop the NSStringEncodingDetectionUseOnlySuggestedEncodingsKey option.
One thing to note about the encoding array you pass in: although the documentation doesn't promise that the suggested encodings are attempted in order, spelunking through the disassembly of the (current) method implementation shows that the array is enumerated using fast enumeration (i.e., in order). I can imagine that this could change in the future (or have been different in the past) so if this is somehow a hard requirement for you, you could theoretically work around it by repeatedly calling +stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: one encoding at a time in order, but this would likely be incredibly expensive given the complexity of this method.

How to prevent xcode treating <# #> as a code snippet placeholder

I want to include <# #> in a literal string in an Objective C program. However, Xcode automatically replaces the string thinking it should be a placeholder as used in code snippets.
e.g.
NSString *pattern = #"<#foo#>"
Try putting this into Xcode and it redisplays as a placeholder for foo.
In the actual program the string is an RE like this <#(.*)#> and surprisingly, the code actually works as intended - the problem only manifests itself in the display in Xcode. The real problem is that this creates a fragile situation, it is far too easy to cause the text to be replaced accidentally when in the editor.
As a workaround I can construct the string from it's parts
NSString *p1 = #"<#(.*)";
NSString *p2 = #"#>";
NSString *pattern = [NSString stringWithFormat:#"%#%#",p1,p2];
but that is not very satisfactory.
Does anyone know a better way to override this behaviour in Xcode?
You can avoid the bad behavior by taking advantage of string concatenation, e.g.
NSString *str = #"<" "#foo#" ">";
Swift 4, Xcode 9 - you just need to separate out or replace one character to make the sequence work. Still not perfect...
string concatenation:
let pattern = "<" + "#foo#>"
string interpolation:
let pattern = "\("<")#foo#>"
percent encoding:
let pattern = "%003c#foo#>".removingPercentEncoding
Funny fact: printing the pattern to the console displays as a placeholder, foo.
Not so funny fact: you can't print the pattern to a log. Since it's created via concatenation or other runtime processing, it doesn't meet the StaticString type required in a log message.
Update: you can use the pattern in a log (though, again, it displays as a placeholder - argh!). To get past the StaticString issue, you can use %# in your log statement:
let pattern = "<" + "#foo#>" // or whichever version you like best
os_log("%#", log: .default, type: .debug, pattern)

NSString and NSMutableString concatenation

I have three strings (a NSString, a NSMutableString, and another NSString) which I need to concatenate into a mutable string, in that order, to display as the source for a UIWebView. Comming from a PHP/JavaScript/HTML background, my knowledge of concatenation is pretty much this:
var concatenatedString = string1 + string2 + string3;
I presume that sort of thing won't work in Objective-C, so I'm wondering how to go about pulling them all together properly.
To give a bit of setting for this, the first string (NSString) is the header and canvas element of a web page, the second string (NSMutableString) is javascript from a text field that the user can define to manipulate the canvas element, and the third string (NSString) is the end tags of the web page.
Also, rather than initially creating the NSMutableString, should I just referance the UITextView.text to the get the user's text when concatenating the whole thing, or should I pull the text from the UITextView first?
NSMutableString *concatenatedString = [[NSString stringWithFormat:#"%#%#%#", string1, string2, string3] mutableCopy];
The other two answers are correct in that they answer the question as you asked it. But by your description of what you want to do there is a much easier way. Use a format.
Assuming string1 and string3 will always be the same and only string2 will change,which is what it sounds like you are doing you can write something like this.
static NSString *formatString = #"String1Text%#String3Text";
NSString *completeString = [NSString stringWithFormat:formatString,self.myTextFieldName.text];
NSLog(#"%#",completeString);
The %# in the format says to insert the description of the object following the format.(The description of an NSString is the stringValue.)
Assuming you have a UITextField named myTextFieldName, that currently contains the text 'String2Text' Then this will be the output:
'String1TextString2TextString3Text'
In this way you only create 1 instance of an NSString format for the whole class no matter how many times you call this code.
To me it sounds like you don't need a mutable string at all. Feel free to leave a comment if I misunderstood anything.
Response to comment:
I'm not sure how you are implementing 'moves to test it out again' but, let's say you have a button named 'testJavaScript'. The IBAction method connected to that button would have the first two lines in it. So each time you pushed the button it would make a new formatted NSString filled with the current contents of the textfield. Once this string was formed it could not be changed. But it won't matter since next time it will make another.
NSString *concatenatedString = [string1 stringByAppendingFormat:#"%#%#", string2, string3];
You can make the resulting string mutable (if you really need to) by adding mutableCopy as shown in the answer by #Vinnie.

Unihan: combining UTF-8 chars

I am using data that involves Chinese Unihan characters in an Objective-C app. I am using a voice recognition program (cmusphinx) that returns a phrase from my data. It returns UTF-8 characters and when returning a Chinese character (which is three bytes) it separates it into three separate characters.
Example: When I want 人 to, I see: ‰∫∫. This is the proper in coding (E4 BA BA), but my code sees the returned value as three seperate characters rather than one.
Actually, my function is receiving the phrase as an NSString, (due to a wrap around) which uses UTF-16. I tried using Objective-C's built in conversion methods (to UTF-8 and from UTF-16), but these keep my string as three characters.
How can I decode these three separate characters into the one utf-8 codepoint for the Chinese character?
Or how can I properly encode it?
This is code fragment dealing with the cstring returned from sphinx and its encoding to a NSString:
const char * hypothesis = ps_get_hyp(pocketSphinxDecoder, &recognitionScore, &utteranceID);
NSString *hypothesisString = [[NSString alloc] initWithCString:hypothesis encoding:NSMacOSRomanEncoding];
Edit: From looking at the addition to your post, you actually do have control over the string encoding. In that case, why are you creating the string with NSMacOSRomanEncoding when you're expecting utf-8? Just change that to NSUTF8StringEncoding.
It sounds like what you're saying is you're being given an NSString that contains UTF-8 data that's being interpreted as a single-byte encoding (e.g. ISO-Latin-1, MacRoman, etc). I'm assuming here that you have no control over the code that creates the NSString, because if you did then the solution is just to change the encoding it's initializing with.
In any case, what you're asking for is a way to take the data in the string and convert it back to UTF-8. You can do this by creating an NSData from the NSString using whatever encoding its was originally created with (you need to know this much, at least, or it won't work), and then you can create a new NSString from the same data using UTF-8.
From the example character you gave (人) it looks like it's being interpreted as MacRoman, so lets go with that. The following code should convert it back:
- (NSString *)fixEncodingOfString:(NSString *)input {
CFStringEncoding cfEncoding = kCFStringEncodingMacRoman;
NSStringEncoding encoding = CFStringCovnertEncodingToNSStringEncoding(cfEncoding);
NSData *data = [input dataUsingEncoding:encoding];
if (!data) {
// the string wasn't actually in MacRoman
return nil;
}
NSString *output = [[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding] autorelease];
}

Unfamiliar C syntax in Objective-C context

I am coming to Objective-C from C# without any intermediate knowledge of C. (Yes, yes, I will need to learn C at some point and I fully intend to.) In Apple's Certificate, Key, and Trust Services Programming Guide, there is the following code:
static const UInt8 publicKeyIdentifier[] = "com.apple.sample.publickey\0";
static const UInt8 privateKeyIdentifier[] = "com.apple.sample.privatekey\0";
I have an NSString that I would like to use as an identifier here and for the life of me I can't figure out how to get that into this data structure. Searching through Google has been fruitless also. I looked at the NSString Class Reference and looked at the UTF8String and getCharacters methods but I couldn't get the product into the structure.
What's the simple, easy trick I'm missing?
Those are C strings: Arrays (not NSArrays, but C arrays) of characters. The last character is a NUL, with the numeric value 0.
“UInt8” is the CoreServices name for an unsigned octet, which (on Mac OS X) is the same as an unsigned char.
static means that the array is specific to this file (if it's in file scope) or persists across function calls (if it's inside a method or function body).
const means just what you'd guess: You cannot change the characters in these arrays.
\0 is a NUL, but including it explicitly in a "" literal as shown in those examples is redundant. A "" literal (without the #) is NUL-terminated anyway.
C doesn't specify an encoding. On Mac OS X, it's generally something ASCII-compatible, usually UTF-8.
To convert an NSString to a C-string, use UTF8String or cStringUsingEncoding:. To have the NSString extract the C string into a buffer, use getCString:maxLength:encoding:.
I think some people are missing the point here. Everyone has explained the two constant arrays that are being set up for the tags, but if you want to use an NSString, you can simply add it to the attribute dictionary as-is. You don't have to convert it to anything. For example:
NSString *publicTag = #"com.apple.sample.publickey";
NSString *privateTag = #"com.apple.sample.privatekey";
The rest of the example stays exactly the same. In this case, there is no need for the C string literals at all.
Obtaining a char* (C string) from an NSString isn't the tricky part. (BTW, I'd also suggest UTF8String, it's much simpler.) The Apple-supplied code works because it's assigning a C string literal to the static const array variables. Assigning the result of a function or method call to a const will probably not work.
I recently answered an SO question about defining a constant in Objective-C, which should help your situation. You may have to compromise by getting rid of the const modifier. If it's declared static, you at least know that nobody outside the compilation unit where it's declared can reference it, so just make sure you don't let a reference to it "escape" such that other code could modify it via a pointer, etc.
However, as #Jason points out, you may not even need to convert it to a char* at all. The sample code creates an NSData object for each of these strings. You could just do something like this within the code (replacing steps 1 and 3):
NSData* publicTag = [#"com.apple.sample.publickey" dataUsingEncoding:NSUnicodeStringEncoding];
NSData* privateTag = [#"com.apple.sample.privatekey" dataUsingEncoding:NSUnicodeStringEncoding];
That sure seems easier to me than dealing with the C arrays if you already have an NSString.
try this
NSString *newString = #"This is a test string.";
char *theString;
theString = [newString cStringWithEncoding:[NSString defaultCStringEncoding]];