Inserting hebrew into NSMutableArrey - objective-c

When I insert Hebrew (LTR) string into NSMutableArrey, the string is distorted somehow.
What do I do?
NSString *peace = #"שלום";
NSLog(#"peace - %#", peace);
NSMutableArray *peaceArrey = [[NSMutableArray alloc]initWithCapacity:1];
[peaceArrey addObject:peace];
NSLog(#"peaceArrey - %#",peaceArrey);
And here is the log:
peace - שלום
peaceArrey - (
"\U05e9\U05dc\U05d5\U05dd"
)

everything should be ok, try NSLog(#"%#", peaceArrey[0]);
the result you are seeing is just the way NSArrays are printed: unicode chars are represented as their codes.

Don't mistake the representation that gets logged as the actual value of the object. NSArray's description is in an old-style property list format. Among other things, that means that non-ASCII values in strings are represented as escape sequences. You're seeing the Unicode characters as a series of UTF-16 code units expressed as escape sequences.

When using the %# format specifier NSLog calls description on the argument to log the string.
In case of a plain string this method just returns the string:
NSLog(#"string: %#", #"שלום");
// prints שלום
If you, on the other hand, put the string into an array, the NSArray's description method is called which, in turn, calls descriptionWithLocale:indent:.
This method just creates a property list formatted string. It uses the NSPropertyListOpenStepFormat which is ASCII encoded. That's why it has to escape the hebrew unicode characters.

Related

How to do NSLog with variable

What should be the correct format of the below to print *newString ?
NSString *newString = #"Hello this is a string!";
NSLog(#newString);
NSLog works pretty much as a C printf, with the addition of the %# string format specifier, which is meant for objects. Being NSString an object, %# is the right format to use:
NSString *newString = #"Hello this is a string!";
NSLog(#"%#", newString);
For as tempting as it can look, NEVER do
NSLog(newString); //NONONONONO!
since it's a terrible practice that may lead to unexpected crashes (not to mention security issues).
More info on the subject: Warning: "format not a string literal and no format arguments"
The # symbol is just a shorthand for specifying some common Objective-C objects. #"..." represents a string (NSString to be specific, which is different from regular C strings), #[...] represents an array (NSArray), #{...} represents a dictionary (NSDictionary).
On the first line, you've already specified a NSString object using the # sign. newString is now an NSString instance. On the second line, you can just give it's variable name:
NSLog(newString);
You could theoretically just give the variable name, but it is a dangerous approach. If newString has any format specifiers, your app may crash/mess up (or access something that it shouldn't be accesing) because NSLog would try to read the arguments corresponding to the format specifiers, but the arguments don't exist. The safe solution would be NSLog(#"%#", newString);. The first argument to NSLog is now hard-coded and can't be changed. We now know that it will expect a single argument, that we are providing that argument, newString, so we are safe.
Because you've already specified a string and just passing that instance to NSLog, you don't need the # sign again.

Working with big numbers in Objective-C?

I need to convert values like 1393443048683555715 to HEX. But, first of all, i cann't display it as decimal using NSLog(), for example.
Ok, it works:
NSLog(#"%qu", 1393443048683555706);
But what about converting to HEX. What type i have to use to store this big value?
NSLog([NSString stringWithFormat: #"%x", 1393443048683555706]);
// result eb854b7a. It's incorrect result!
but i forgot to say that this big number represented as string #"1393443048683555706" (not int)
You can use %qi and %qu format specifiers with NSLog to display 64-bit integers. Your constant appears to fit in 64-bit signed number, with the limits of:
[−9223372036854775808 to 9223372036854775807]
The "x" format specifier is for 32-bit numbers; you need to use either "qx" or "qX" (depending on whether you want the letter values to be uppercase or not). These are the formatters for unsigned long long values, see:
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/formatSpecifiers.html#//apple_ref/doc/uid/TP40004265-SW1
Next, you should not pass a string as you have done above directly to NSLog - this can cause a crash.
NSLog(string); // bad!!
NSLog(#"%#", string); // good
So if your value comes as a string, you'll want to do this:
NSString *longNumber = #"1393443048683555706";
NSLog(#"%qx", [longNumber longLongValue]);
If the string value can't be coerced to a number, longLongValue will return 0. I'll leave it to you do handle the error (and bounds) checking - see NSString for details.
If you want to save the hex value as a string, do this:
NSString *hexRepresentation = [NSString stringWithFormat:#"%qx", [longNumber longLongValue]];
Again, best to take care for error handling.

Unihan: combining UTF-8 chars

I am using data that involves Chinese Unihan characters in an Objective-C app. I am using a voice recognition program (cmusphinx) that returns a phrase from my data. It returns UTF-8 characters and when returning a Chinese character (which is three bytes) it separates it into three separate characters.
Example: When I want 人 to, I see: ‰∫∫. This is the proper in coding (E4 BA BA), but my code sees the returned value as three seperate characters rather than one.
Actually, my function is receiving the phrase as an NSString, (due to a wrap around) which uses UTF-16. I tried using Objective-C's built in conversion methods (to UTF-8 and from UTF-16), but these keep my string as three characters.
How can I decode these three separate characters into the one utf-8 codepoint for the Chinese character?
Or how can I properly encode it?
This is code fragment dealing with the cstring returned from sphinx and its encoding to a NSString:
const char * hypothesis = ps_get_hyp(pocketSphinxDecoder, &recognitionScore, &utteranceID);
NSString *hypothesisString = [[NSString alloc] initWithCString:hypothesis encoding:NSMacOSRomanEncoding];
Edit: From looking at the addition to your post, you actually do have control over the string encoding. In that case, why are you creating the string with NSMacOSRomanEncoding when you're expecting utf-8? Just change that to NSUTF8StringEncoding.
It sounds like what you're saying is you're being given an NSString that contains UTF-8 data that's being interpreted as a single-byte encoding (e.g. ISO-Latin-1, MacRoman, etc). I'm assuming here that you have no control over the code that creates the NSString, because if you did then the solution is just to change the encoding it's initializing with.
In any case, what you're asking for is a way to take the data in the string and convert it back to UTF-8. You can do this by creating an NSData from the NSString using whatever encoding its was originally created with (you need to know this much, at least, or it won't work), and then you can create a new NSString from the same data using UTF-8.
From the example character you gave (人) it looks like it's being interpreted as MacRoman, so lets go with that. The following code should convert it back:
- (NSString *)fixEncodingOfString:(NSString *)input {
CFStringEncoding cfEncoding = kCFStringEncodingMacRoman;
NSStringEncoding encoding = CFStringCovnertEncodingToNSStringEncoding(cfEncoding);
NSData *data = [input dataUsingEncoding:encoding];
if (!data) {
// the string wasn't actually in MacRoman
return nil;
}
NSString *output = [[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding] autorelease];
}

Objective-C and scanning an integer or character to a hex representation

NSScanner has the instance method -scanHexInt: for converting a hexadecimal string representation of an int to an int. I'd like to invert this: I'd like a method that takes an int and returns a hexadecimal representation.
Is there a ready made method in the docs?
[To be more relevant, I'd like to take a string comprising a single kanji and return its unicode point.]
If you check the documentation on string format specifiers, you'll see that there are several specifiers you can use to get a hexadecimal string representation of a number. For instance, you could use:
[NSString stringWithFormat:#"%x", foo]

Xcode - EXEC_BAD_ACCESS when concatenting a large string

I'm getting a EXEC_BAD_ACCESS when concatenting a large string.
I've read from a feed and to create my webview I build up my string like:
NSString *pageData = #"<h1>header</h1>";
pageData = [pageData stringByAppendingFormat#"<p>"];
pageData = [pageData stringByAppendingFormat#"self.bodyText"];
pageData = [pageData stringByAppendingFormat#"</p>"];
etc
The problem I've got is self.bodytext is 21,089 characters with spaces when I do a count on word.
Is there a better method for doing this?
Thanks
You would definitely want to use NSMutableString for something like this:
NSMutableString * pageData = [NSMutableString stringWithCapacity:0];
[pageData appendFormat:#"<h1>header</h1>"];
[pageData appendFormat:#"<p>"];
...
NSMutableString is designed for this kind of sequential concatenation, where the basic NSString class is really not meant to be used in this manner. Your original code would actually allocate a new NSString every time you called stringByAppendFormat:, and then procede to copy into it all of the thousands of characters you had already appended. This could easily result in an out of memory error, since the size of the temporary strings would be growing exponentially as you add more and more calls.
Using NSMutableString will not re-copy all of the string data when you call appendFormat:, since the mutable string maintains an internal buffer and simply tacks new strings on to the end of it. Depending on the size of your string, you may want to reserve a huge chunk of memory ahead of time (use a meaningful number for the ...WithCapacity: argument). But there is no need to go that route unless you actually run into performance issues.
There are a few problems with your sample code:
You should be using a NSMutableString to build up an output string by appending multiple parts. NSString is an immutable class which means that each time you call stringByAppendingFormat: you are incurring the overhead of creating an additional new NSString object which will need to be collected and released by the autorelease pool.
NSMutableString * pageData = [NSMutableString stringWithCapacity:0];
You should use appendString: on your NSMutableString to append content, instead of stringByAppendingFormat: or appendFormat:. The format methods are intended for creating new strings based on a format specifier which includes special fields as placeholders. See Formatting String Objects for more details. When you're using stringByAppendingFormat: with just a literal string like your code has, you are incurring the overhead of parsing the string for the non-existant placeholders, and more importantly, if the string happens to have a placeholder (or something that looks like one) in it, you'll end up with the EXEC_BAD_ACCESS crash that you are getting. Most likely this happening when your bodyText is appended. Thus if you simply want to append a '' to your NSMutableString do something like this:
[pageData appendString:#"<p>"];
If you want to append the contents of the self.bodyText property to the string, you shouldn't put the name of the property inside of a string literal (i.e. #"self.bodyText" is the literal string "self.bodyText", not the contents of the property. Try:
[pageData appendString:self.bodyText];
As an example, you could actually combine all three lines of your sample code by using a format specification:
pageData = [pageData stringByAppendingFormat:#"<p>%#</p>", self.bodyText];
In the format specification %# is a placeholder that means insert the result of sending the description or descriptionWithLocale: message to the object. For an NSString this is simply the contents of the string.
I doubt the length of the string is really a problem. A 50,000-character string is only about 100 KB. But you want to be very careful about using format strings. If your string contains something that looks like a formatting specifier, there had better be a corresponding argument or you'll get garbage if you're lucky and a crash if you're not. I suspect this is the error, since there is no other obvious problem from your description. Be careful about what you put in there, and avoid ever putting dynamic text in a format string — just put a %# in the format string and pass the dynamic text as an argument.
Use appendString: instead of appendFormat: when dealing with arbitrary strings.
pageData = [pageData stringByAppendingString:#"<p>"];
pageData = [pageData stringByAppendingString:#"self.bodyText"];
pageData = [pageData stringByAppendingString:#"</p>"];
or do not use an arbitrary string as the format:
pageData = [pageData stringByAppendingFormat:#"<p>%#</p>" , #"self.bodyText"];
If you are building the string up in pieces, use NSMutableString instead of several stringBy calls.
Remember that % is a special character for formatted strings and for url escapes, so if bodyText contains a url it could easily cause a crash.