NSLog incorrect encoding - objective-c

I've got a problem with the following code:
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
NSLog(#"%s", temp);
in the first line of the codes, two Chinese characters are double quoted. The problem is printf function can display the Chinese characters properly, but NSLog can't.
Thanks to all. I figured out a solution for this problem. Foundation uses UTF-16 by default, so in order to use NSLog to output the c string in the example, I have to use cStringUsingEncoding to get UTF-16 c string and use %S to replace %s.
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
strcpy(temp, [strValue cStringUsingEncoding:NSUTF16LittleEndianStringEncoding]);
NSLog(#"%S", temp);

NSLog's %s format specifier is in the system encoding, which seems to always be MacRoman and not unicode, so it can only display characters in MacRoman encoding. Your best option with NSLog is just to use the native object format specifier %# and pass the NSString directly instead of converting it to a C String. If you only have a C string and you want to use NSLog to display a message instead of printf or asl, you will have to do something like Don suggests in order to convert the string to an NSString object first.
So, all of these should display the expected string:
NSString *str = #"你好";
const char *cstr = [str UTF8String];
NSLog(#"%#", str);
printf("%s\n", cstr);
NSLog(#"%#", [NSString stringWithUTF8String:cstr]);
If you do decide to use asl, note that while it accepts strings in UTF8 format and passes the correct encoding to the syslog daemon (so it will show up properly in the console), it encodes the string for visual encoding when displaying to the terminal or logging to a file handle, so non-ASCII values will be displayed as escaped character sequences.

My guess is that NSLog assumes a different encoding for 8-bit C-strings than UTF-8, and it may be one that doesn't support Chinese characters. Awkward as it is, you might try this:
NSLog(#"%#", [NSString stringWithCString: temp encoding: NSUTF8StringEncoding]);

I know you are probably looking for an answer that will help you understand what's going on.
But this is what you could do to solve your problem right now:
NSLog(#"%#", strValue);

# define NSLogUTF8(a,b) NSLog(a,[NSString stringWithCString:[[NSString stringWithFormat:#"%#",b] cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSNonLossyASCIIStringEncoding])
#define NSLogUTF8Ex(a,b) NSLog(a,[MLTool utf8toNString:[NSString stringWithFormat:#"%#",b]])
+(NSString*)utf8toNString:(NSString*)str{
NSString* strT= [str stringByReplacingOccurrencesOfString:#"\\U" withString:#"\\u"];
//NSString *strT = [strTemp mutableCopy];
CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)strT, NULL, transform, YES);
return strT;
}

Related

Trimmed string change it's length only by half

Could anyone give some advise please.
In my iOS app I am parsing XML (with a help of third-party-library) and have a problem with extra whitespaces/newLines at the beginning/end of the strings. Initial string, that return this third-party-library, it's a C++ std::wstring that I convert to NSString (the encoding should be right as the content of new NSString is equal to proper part of my XML-file). After the trim length of "empty" elements (that contain only whitespaces and new lines) doesn't become zero but change it's value by half.
The code is below....
std::wstring val;
NSString *initial = [[NSString alloc] initWithBytes:val.data() length:sizeof (wchar_t)*val.size() encoding:NSUTF16LittleEndianStringEncoding];
NSString *trimmed = [initial stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
If try to output like NSLog(#"bybyby'%#'bebebe", trimmed); 'bebebe have never displayed. Looks like that there are left some new lines, whitespaces that can't be detected.
wchar_t is a 32-bit integer (on iOS and OS X), therefore you must use NSUTF32LittleEndianStringEncoding for the conversion to NSString.
Example:
std::wstring val (L" Hello World ");
NSString *initial = [[NSString alloc] initWithBytes:val.data() length:sizeof (wchar_t)*val.size() encoding:NSUTF32LittleEndianStringEncoding];
NSString *trimmed = [initial stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSLog(#"'%#'", trimmed);
// Output: 'Hello World'
What probably happened in your case (with NSUTF16LittleEndianStringEncoding)
is that every second character in the initial string is a NUL character,
which acts as a terminator when printed.

Weird error with NSString: No known class method for selector 'stringWithBytes:length:encoding:'

I am attempting to use scanf to assign a value to an NSString, as per the answers to this question by Omar. This is the code, taken straight from progrmr's answer:
char word[40];
int nChars = scanf("%39s", word); // read up to 39 chars (leave room for NUL)
NSString* word2 = [NSString stringWithBytes:word
length:nChars
encoding:NSUTF8StringEncoding];
However, I'm getting an error on the last line that makes absolutely no sense to me:
No known class method for selector 'stringWithBytes:length:encoding:'
What in the world could be causing this error?
And yes, I do have #import <Foundation/Foundation.h> at the top of the file.
NSString does not have a stringWithBytes:length:encoding: class method, but you can use
NSString* word2 = [[NSString alloc] initWithBytes:word
length:nChars
encoding:NSUTF8StringEncoding];
Note however, that scanf() returns the number of scanned items and
not the number of scanned characters. So nChars will contain 1 and not the string length, so you should set nChars = strlen(word) instead.
A simpler alternative is (as also mentioned in one answer to the linked question)
NSString* word2 = [NSString stringWithUTF8String:word];
NSString does not respond to the selector stringWithBytes:length:encoding:. You probably wanted initWithBytes:length:encoding:.
Story in short: you might want to consider a const char C-string suitable initializer for your NSString object. Also, allocate memory before sending any initializer message to the NSString object. I would expect something like:
char word[40];
int nChars = scanf("%39s", word);
NSString *word2 = [[NSString alloc] initWithCString:word encoding:NSASCIIStringEncoding];
Note that initWithCString per design only supports properly null '\0' terminated 8-bit character arrays. For unterminated bytes arrays you have initWithBytes:length:encoding: instead.
For Unicode characters you could consider initWithCharactersNoCopy:length:freeWhenDone:.

How do I convert a NSString into a std::string?

I have an NSString object and want to convert it into a std::string.
How do I do this in Objective-C++?
NSString *foo = #"Foo";
std::string bar = std::string([foo UTF8String]);
Edit: After a few years, let me expand on this answer. As rightfully pointed out, you'll most likely want to use cStringUsingEncoding: with NSASCIIStringEncoding if you are going to end up using std::string. You can use UTF-8 with normal std::strings, but keep in mind that those operate on bytes and not on characters or even graphemes. For a good "getting started", check out this question and its answer.
Also note, if you have a string that can't be represented as ASCII but you still want it in an std::string and you don't want non-ASCII characters in there, you can use dataUsingEncoding:allowLossyConversion: to get an NSData representation of the string with lossy encoded ASCII content, and then throw that at your std::string
As Ynau's suggested in the comment, in a general case it would be better to keep everything on the stack instead of heap (using new creates the string on the heap), hence (assuming UTF8 encoding):
NSString *foo = #"Foo";
std::string bar([foo UTF8String]);
As noted on philjordan.eu it could also be that the NSString is nil. In such a case the cast should be done like this:
// NOTE: if foo is nil this will produce an empty C++ string
// instead of dereferencing the NULL pointer from UTF8String.
This would lead you to such a conversion:
NSString *foo = #"Foo";
std::string bar = std::string([foo UTF8String], [foo lengthOfBytesUsingEncoding:NSUTF8StringEncoding]);

Objective-C: Reading contents of a file into an NSString object doesn't convert unicode

I have a file, which I'm reading into an NSString object using stringWithContentsOfFile. It contains Unicode for Japanese characters such as:
\u305b\u3044\u3075\u304f
which I believe is
せいふく
I would like my NSString object to store the string as the latter, but it is storing it as the former.
The thing I don't quite understand is that when I do this:
NSString *myString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];
It stores it as: \u305b\u3044\u3075\u304f.
But when I hardcode in the string:
NSString *myString = #"\u305b\u3044\u3075\u304f";
It correctly converts it and stores it as: せいふく
Does stringWIthContentsOfFile escape the Unicode in some way? Any help will be appreciated.
Thanks.
In the file \u305b\u3044\u3075\u304f are just normal characters. So you are getting them in string. You need to save actual Japanese characters in the file. That is, store せいふく in file and that will be loaded in the string.
You can try this, dont know how feasible it is..
NSArray *unicodeArray = [stringFromFile componentsSeparatedByString:#"\\u"];
NSMutableString *finalString = [[NSMutableString alloc] initWithString:#""];
for (NSString *unicodeString in unicodeArray) {
if (![unicodeString isEqualToString:#""]) {
unichar codeValue;
[[NSScanner scannerWithString:unicodeString] scanHexInt:&codeValue];
NSString* betaString = [NSString stringWithCharacters:&codeValue length:1];
[finalString appendString:betaString];
}
}
//finalString should have せいふく
Something like \u305b in an Objective-C string is in fact an instruction to the compiler to replace it with the actual UTF-8 byte sequence for that character. The method reading the file is not a compiler, and only reads the bytes it finds. So to get that character (officially called "code point"), your file must contain the actual UTF-8 byte sequence for that character, and not the symbolic representation \u305b.
It's a bit like \x43. This is, in your source code, four characters, but it is replaced by one byte with value 0x43. So if you write #"\x43" to a file, the file will not contain the four characters '\', 'x', '4', '3', it will contain the single character 'C' (which has ASCII value 0x43).

Using scanf with NSStrings

I want the user to input a string and then assign the input to an NSString. Right now my code looks like this:
NSString *word;
scanf("%s", &word);
The scanf function reads into a C string (actually an array of char), like this:
char word[40];
int nChars = scanf("%39s", word); // read up to 39 chars (leave room for NUL)
You can convert a char array into NSString like this:
NSString* word2 = [NSString stringWithBytes:word
length:nChars
encoding:NSUTF8StringEncoding];
However scanf only works with console (command line) programs. If you're trying to get input on a Mac or iOS device then scanf is not what you want to use to get user input.
scanf does not work with any object types. If you have a C string and want to create an NSString from it, use -[NSString initWithBytes:length:encoding:].
scanf does not work with NSString as scanf doesn’t work on objects. It works only on primitive datatypes such as:
int
float
BOOL
char
What to do?
Technically a string is made up of a sequence of individual characters. So to accept string input, you can read in the sequence of characters and convert it to a string.
use:
[NSString stringWithCString:cstring encoding:1];
Here is a working example:
NSLog(#"What is the first name?");
char cstring[40];
scanf("%s", cstring);
firstName = [NSString stringWithCString:cstring encoding:1];
Here’s an explanation of the above code, comment by comment:
You declare a variable called cstring to hold 40 characters.
You then tell scanf to expect a list of characters by using the %s format specifier.
Finally, you create an NSString object from the list of characters that were read in.
Run your project; if you enter a word and hit Enter, the program should print out the same word you typed. Just make sure the word is less than 40 characters; if you enter more, you might cause the program to crash — you are welcome to test that out yourself! :]
Taken from: RW.
This is how I'd do it:
char word [40];
scanf("%s",word);
NSString * userInput = [[NSString alloc] initWithCString: word encoding: NSUTF8StringEncoding];
yes, but sscanf does, and may be a good solution for complex NSString parsing.
Maybe this will work for you because it accepts string with spaces as well.
NSLog(#"Enter The Name Of State");
char name[20];
gets(name);
NSLog(#"%s",name);
Simple Solution is
char word[40];
scanf("%39s", word);
NSString* word2 = [NSString stringWithUTF8String:word];
The NSFileHandle class is an object-oriented wrapper for a file descriptor. For files, you can read, write, and seek within the file.
NSFileHandle *inputFile = [NSFileHandle fileHandleWithStandardInput];
NSData *inputData = [inputFile availableData];
NSString *word = [[NSString alloc]initWithData:inputData encoding:NSUTF8StringEncoding];