Objective-C: Reading contents of a file into an NSString object doesn't convert unicode - objective-c

I have a file, which I'm reading into an NSString object using stringWithContentsOfFile. It contains Unicode for Japanese characters such as:
\u305b\u3044\u3075\u304f
which I believe is
せいふく
I would like my NSString object to store the string as the latter, but it is storing it as the former.
The thing I don't quite understand is that when I do this:
NSString *myString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];
It stores it as: \u305b\u3044\u3075\u304f.
But when I hardcode in the string:
NSString *myString = #"\u305b\u3044\u3075\u304f";
It correctly converts it and stores it as: せいふく
Does stringWIthContentsOfFile escape the Unicode in some way? Any help will be appreciated.
Thanks.

In the file \u305b\u3044\u3075\u304f are just normal characters. So you are getting them in string. You need to save actual Japanese characters in the file. That is, store せいふく in file and that will be loaded in the string.

You can try this, dont know how feasible it is..
NSArray *unicodeArray = [stringFromFile componentsSeparatedByString:#"\\u"];
NSMutableString *finalString = [[NSMutableString alloc] initWithString:#""];
for (NSString *unicodeString in unicodeArray) {
if (![unicodeString isEqualToString:#""]) {
unichar codeValue;
[[NSScanner scannerWithString:unicodeString] scanHexInt:&codeValue];
NSString* betaString = [NSString stringWithCharacters:&codeValue length:1];
[finalString appendString:betaString];
}
}
//finalString should have せいふく

Something like \u305b in an Objective-C string is in fact an instruction to the compiler to replace it with the actual UTF-8 byte sequence for that character. The method reading the file is not a compiler, and only reads the bytes it finds. So to get that character (officially called "code point"), your file must contain the actual UTF-8 byte sequence for that character, and not the symbolic representation \u305b.
It's a bit like \x43. This is, in your source code, four characters, but it is replaced by one byte with value 0x43. So if you write #"\x43" to a file, the file will not contain the four characters '\', 'x', '4', '3', it will contain the single character 'C' (which has ASCII value 0x43).

Related

Trimmed string change it's length only by half

Could anyone give some advise please.
In my iOS app I am parsing XML (with a help of third-party-library) and have a problem with extra whitespaces/newLines at the beginning/end of the strings. Initial string, that return this third-party-library, it's a C++ std::wstring that I convert to NSString (the encoding should be right as the content of new NSString is equal to proper part of my XML-file). After the trim length of "empty" elements (that contain only whitespaces and new lines) doesn't become zero but change it's value by half.
The code is below....
std::wstring val;
NSString *initial = [[NSString alloc] initWithBytes:val.data() length:sizeof (wchar_t)*val.size() encoding:NSUTF16LittleEndianStringEncoding];
NSString *trimmed = [initial stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
If try to output like NSLog(#"bybyby'%#'bebebe", trimmed); 'bebebe have never displayed. Looks like that there are left some new lines, whitespaces that can't be detected.
wchar_t is a 32-bit integer (on iOS and OS X), therefore you must use NSUTF32LittleEndianStringEncoding for the conversion to NSString.
Example:
std::wstring val (L" Hello World ");
NSString *initial = [[NSString alloc] initWithBytes:val.data() length:sizeof (wchar_t)*val.size() encoding:NSUTF32LittleEndianStringEncoding];
NSString *trimmed = [initial stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSLog(#"'%#'", trimmed);
// Output: 'Hello World'
What probably happened in your case (with NSUTF16LittleEndianStringEncoding)
is that every second character in the initial string is a NUL character,
which acts as a terminator when printed.

How to get a single NSString character from an NSString

I want to get a character from somewhere inside an NSString. I want the result to be an NSString.
This is the code I use to get a single character at index it:
[[s substringToIndex:i] substringToIndex:1]
Is there a better way to do it?
This will also retrieve a character at index i as an NSString, and you're only using an NSRange struct rather than an extra NSString.
NSString * newString = [s substringWithRange:NSMakeRange(i, 1)];
If you just want to get one character from an a NSString, you can try this.
- (unichar)characterAtIndex:(NSUInteger)index;
Used like so:
NSString *originalString = #"hello";
int index = 2;
NSString *theCharacter = [NSString stringWithFormat:#"%c", [originalString characterAtIndex:index-1]];
//returns "e".
Your suggestion only works for simple characters like ASCII. NSStrings store unicode and if your character is several unichars long then you could end up with gibberish. Use
- (NSRange)rangeOfComposedCharacterSequenceAtIndex:(NSUInteger)index;
if you want to determine how many unichars your character is. I use this to step through my strings to determine where the character borders occur.
Being fully unicode able is a bit of work but depends on what languages you use. I see a lot of asian text so most characters spill over from one space and so it's work that I need to do.
NSMutableString *myString=[NSMutableString stringWithFormat:#"Malayalam"];
NSMutableString *revString=#"";
for (int i=0; i<myString.length; i++) {
revString=[NSMutableString stringWithFormat:#"%c%#",[myString characterAtIndex:i],revString];
}
NSLog(#"%#",revString);

Using scanf with NSStrings

I want the user to input a string and then assign the input to an NSString. Right now my code looks like this:
NSString *word;
scanf("%s", &word);
The scanf function reads into a C string (actually an array of char), like this:
char word[40];
int nChars = scanf("%39s", word); // read up to 39 chars (leave room for NUL)
You can convert a char array into NSString like this:
NSString* word2 = [NSString stringWithBytes:word
length:nChars
encoding:NSUTF8StringEncoding];
However scanf only works with console (command line) programs. If you're trying to get input on a Mac or iOS device then scanf is not what you want to use to get user input.
scanf does not work with any object types. If you have a C string and want to create an NSString from it, use -[NSString initWithBytes:length:encoding:].
scanf does not work with NSString as scanf doesn’t work on objects. It works only on primitive datatypes such as:
int
float
BOOL
char
What to do?
Technically a string is made up of a sequence of individual characters. So to accept string input, you can read in the sequence of characters and convert it to a string.
use:
[NSString stringWithCString:cstring encoding:1];
Here is a working example:
NSLog(#"What is the first name?");
char cstring[40];
scanf("%s", cstring);
firstName = [NSString stringWithCString:cstring encoding:1];
Here’s an explanation of the above code, comment by comment:
You declare a variable called cstring to hold 40 characters.
You then tell scanf to expect a list of characters by using the %s format specifier.
Finally, you create an NSString object from the list of characters that were read in.
Run your project; if you enter a word and hit Enter, the program should print out the same word you typed. Just make sure the word is less than 40 characters; if you enter more, you might cause the program to crash — you are welcome to test that out yourself! :]
Taken from: RW.
This is how I'd do it:
char word [40];
scanf("%s",word);
NSString * userInput = [[NSString alloc] initWithCString: word encoding: NSUTF8StringEncoding];
yes, but sscanf does, and may be a good solution for complex NSString parsing.
Maybe this will work for you because it accepts string with spaces as well.
NSLog(#"Enter The Name Of State");
char name[20];
gets(name);
NSLog(#"%s",name);
Simple Solution is
char word[40];
scanf("%39s", word);
NSString* word2 = [NSString stringWithUTF8String:word];
The NSFileHandle class is an object-oriented wrapper for a file descriptor. For files, you can read, write, and seek within the file.
NSFileHandle *inputFile = [NSFileHandle fileHandleWithStandardInput];
NSData *inputData = [inputFile availableData];
NSString *word = [[NSString alloc]initWithData:inputData encoding:NSUTF8StringEncoding];

simple question concerning NSString adding multiple strings

I have a fairly simple question concerning NSString however it doesn't seem to do what I want.
this is what i have
NSString *title = [NSString stringWithformat: character.name, #"is the character"];
This is a line in my parser takes the charactername and inserts in into a plist , however it doesn't insert the #"is the character" is there something I'm doing wrong?
Your code is wrong. It should be :
NSString *title
= [NSString stringWithformat:#"%# is the character", character.name];
assuming that character.name is another NSString.
Read the Formatting String Objects paragraph of the String Programming Guide for Cocoa to learn everything about formatting strings.
stringWithFormat takes a format string as the first argument so, assuming character.name is the name of your character, you need:
NSString *title = [NSString stringWithformat: #"%s is the character",
character.name];
What you have is the character name as the format string so, if it's #"Bob" then Bob is what you'll get. If it was "#Bob %s", that would work but would probably stuff up somewhere else that you display just the character name :-)
Note that you should use "%s" for a C string, I think "%#" is the correct format specifier if character.name is an NSString itself.

NSLog incorrect encoding

I've got a problem with the following code:
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
NSLog(#"%s", temp);
in the first line of the codes, two Chinese characters are double quoted. The problem is printf function can display the Chinese characters properly, but NSLog can't.
Thanks to all. I figured out a solution for this problem. Foundation uses UTF-16 by default, so in order to use NSLog to output the c string in the example, I have to use cStringUsingEncoding to get UTF-16 c string and use %S to replace %s.
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
strcpy(temp, [strValue cStringUsingEncoding:NSUTF16LittleEndianStringEncoding]);
NSLog(#"%S", temp);
NSLog's %s format specifier is in the system encoding, which seems to always be MacRoman and not unicode, so it can only display characters in MacRoman encoding. Your best option with NSLog is just to use the native object format specifier %# and pass the NSString directly instead of converting it to a C String. If you only have a C string and you want to use NSLog to display a message instead of printf or asl, you will have to do something like Don suggests in order to convert the string to an NSString object first.
So, all of these should display the expected string:
NSString *str = #"你好";
const char *cstr = [str UTF8String];
NSLog(#"%#", str);
printf("%s\n", cstr);
NSLog(#"%#", [NSString stringWithUTF8String:cstr]);
If you do decide to use asl, note that while it accepts strings in UTF8 format and passes the correct encoding to the syslog daemon (so it will show up properly in the console), it encodes the string for visual encoding when displaying to the terminal or logging to a file handle, so non-ASCII values will be displayed as escaped character sequences.
My guess is that NSLog assumes a different encoding for 8-bit C-strings than UTF-8, and it may be one that doesn't support Chinese characters. Awkward as it is, you might try this:
NSLog(#"%#", [NSString stringWithCString: temp encoding: NSUTF8StringEncoding]);
I know you are probably looking for an answer that will help you understand what's going on.
But this is what you could do to solve your problem right now:
NSLog(#"%#", strValue);
# define NSLogUTF8(a,b) NSLog(a,[NSString stringWithCString:[[NSString stringWithFormat:#"%#",b] cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSNonLossyASCIIStringEncoding])
#define NSLogUTF8Ex(a,b) NSLog(a,[MLTool utf8toNString:[NSString stringWithFormat:#"%#",b]])
+(NSString*)utf8toNString:(NSString*)str{
NSString* strT= [str stringByReplacingOccurrencesOfString:#"\\U" withString:#"\\u"];
//NSString *strT = [strTemp mutableCopy];
CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)strT, NULL, transform, YES);
return strT;
}