objective c UTF8String not working with japanese

objective c UTF8String not working with japanese - objective-c

I would like to show the NSString below on my UILabel:
NSString *strValue=#"你好";
but i can not show it on my UILabel i get strange characters!
I use this code to show the text:
[NSString stringWithCString:[strValue UTF8String] encoding:NSUTF8StringEncoding];
I tried [NSString stringWithCString:[strValue cStringUsingEncoding:NSISOLatin1StringEncoding] encoding:NSUTF8StringEncoding] and it worked
but i can not show emoticons with cStringUsingEncoding:NSISOLatin1StringEncoding so i have to use UTF8String.
Any help appreciated.

Your source file is in UTF-8, but the compiler you are using thinks it's ISO-Latin 1. What you think is the string #"你好" is actually the string #"ä½ å¥½". But when you ask NSString* to give you this back as ISO-Latin 1, and treat it as UTF-8, you've reversed the process the compiler took and you end up with the original string.
One solution that you can use here is to tell your compiler what encoding your source file is in. There is a compiler flag (for GCC it's -finput-charset=UTF-8, not sure about clang) that will tell the compiler what encoding to use. Curiously, UTF-8 should be the default already, but perhaps you're overriding this with a locale.
A more portable solution is to use only ASCII in your source file. You can accomplish this by replacing the non-ASCII chars with a string escape using \u1234 or \U12345678. In your case, you'd use
NSString *strValue=#"\u4F60\u597D";
Of course, once you get your string constant to be correct, you can ditch the whole encoding stuff and just use strValue directly.

Related

Parsing file with percent signs (%) in Objective-C

I'm writing a parser for fortune files. Fortune is a small app on *nix platforms that just prints out a random "fortune". The fortune files are straight text, with each fortune being separated by a percent sign on its own line. For example:
A little suffering is good for the soul.
-- Kirk, "The Corbomite Maneuver", stardate 1514.0
%
A man either lives life as it happens to him, meets it head-on and
licks it, or he turns his back on it and starts to wither away.
-- Dr. Boyce, "The Menagerie" ("The Cage"), star date unknown
%
What I've found is that when parsing the file, stringWithContentsOfFile returns a string with the % signs in place. For example:
#"A little suffering is good for the soul.\n\t\t-- Kirk, \"The Corbomite Maneuver\", stardate 1514.0\n%\nA man either lives life as it happens to him, meets it head-on and\nlicks it, or he turns his back on it and starts to wither away.\n\t\t-- Dr. Boyce, \"The Menagerie\" (\"The Cage\"), stardate unknown\n%"
However, when I call componentsSeparatedByCharactersInSet on the file contents, everything is parsed as a string, with the exception of the percent signs, which are NSTaggedPointerString. When I print out the lines, the percent signs are gone.
Is this because the percent sign is a format specifier for strings? I would think in that case that the initial content pull would escape those.
Here's the code:
NSFileManager *fileManager;
fileManager = [NSFileManager defaultManager];
NSStringEncoding stringEncoding;
// NSString *fileContents = [NSString stringWithContentsOfFile:fileName encoding:NSASCIIStringEncoding error:nil];
NSString *fileContents = [NSString stringWithContentsOfFile:fileName usedEncoding:&stringEncoding error:nil];
NSArray *fileLines = [fileContents componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]];
The used encoding ends up being UTF-8. You can see I have also tried specifying plain ASCII, but it yields the same results.
So the question is, how do I retain the percent signs? Or, may I should use it as the separator character and then parse each of the subsequent results individually.

You are calling NSLog() but passing the line strings as the format string. Something like:
NSLog(lineString);
Therefore, any percent characters in the line strings are interpreted as format specifiers. You should (almost) never pass strings that come from outside sources — i.e. strings which are not hard-coded in your code — as format strings to any function (NSLog(), printf(), +[NSString stringWithFormat:], etc.). It's not safe and you'll sometimes get unexpected results like you've seen.
You should always log a single string like this:
NSLog(#"%#", lineString);
That is, you need to pass a hard-coded format string and use the foreign string as data for that to format.

NSTaggedPointerString is just subclass of NSString. You can use anywhere as NSString.
But in your string
#"A little suffering is good for the soul.\n\t\t-- Kirk, \"The Corbomite Maneuver\", stardate 1514.0\n%\nA man either lives life as it happens to him, meets it head-on and\nlicks it, or he turns his back on it and starts to wither away.\n\t\t-- Dr. Boyce, \"The Menagerie\" (\"The Cage\"), stardate unknown\n%"
sign % is not percent sign. in Objective-C percent sign is declared as double of % mark
#"%%"

Prevent NSString to escape contents

Sorry for maybe a newbie question.
For various reasons I am stuck with a peculiar string that looks like this:
NSString *myString = #"A\\314\\212A\\314\\210O\\314\\210.jpg";
Can I in some ninja-way remove the double \\ and force NSString understand that the string is Uniencoded and should be read like this
NSString *myString = #"A\314\212A\314\210O\314\210.jpg"; // Displays ÅÄÖ as expected
I have tried different strategies tried to replace all slashes ("\"), but as soon as I add a ("\") NSString adds another one to escape the first one. And I get stuck here...
Is it possible to prevent NSString to escape my string?
UPDATE
I am aware this is a special case. Reading the output from a terminal program which reads files on the users drive. Via a NSTask I am capturing the output to into a NSString for parsing and splitting it into an array. It works great as long as there are no non-ascii characters. HFS+ is encoding non-ascii characters with slightly different Unicode called NFD.
When I am capturing the reponse, the ÅÄÖ are already encoded inside qoutes like this:
file.jpg
file2.jpg
"A\314\212A\314\210O\314\210.jpg"
When I create a NSString and with the captured reponse, it gets escaped by NSString a second time.
A\\314\\212A\\314\\210O\\314\\210.jpg
I am aware that this is not the optimal, but right now I have no control over what the terminal program is outputting. Usually when a NSString is created with this NFD encoding, Objectiv-C takes care of the encoding/decoding for you. But since I have a string with mixed and double escaped content, I have a hard way of creating it and make NSString to understand that the content is encoded with this encoding.
Basically I would like to to this:
decodedString = [output stringByReplacingOccurrencesOfString:#"\\\\"
withString:#"\\"];
But behind the scenes NSString is always escaping \ with another \ for you so I would like a way to create "raw" strings with out NSString interfering.
Have tried various ways to try enforing Unicode encoding on NSString but it all boils down to NSString is always capturing and escaping \.
Any tips och points appreciated!

I did not find any way around this other than go the other way around and change the output from the terminal program not to encode it this way.

Objective-C / C Convert UTF8 Literally to Real string

Im wondering how to convert
NSString = "\xC4"; ....
to real NSString represented in normal format

Fundamentally related to xcode UTF-8 literals. Of course, it is ambiguous what you actually mean by "\xC4" - without an encoding specified, it means nothing.
If you mean the character whose Unicode code point is 0x00C4 then I would think (though I haven't tested) that this will do what you want.
NSString *s = #"\u00C4";

First are you sure you have \xC4 in your string? Consider:
NSString *one = #"\xC4\x80";
NSString *two = #"\\xC4\\x80";
NSLog(#"%# | %#", one, two);
This will output:
Ā | \xC4\x80
If you are certain your string contains the four characters \xC4 are you sure it is UTF-8 encoded as ASCII? Above you will see I added \x80, this is because \xC4 is not valid UTF-8, it is the first byte of a two-byte sequence. Maybe you have only shown a sample of your input and the second byte is present, if not you do not have UTF-8 encoded as ASCII.
If you are certain it is UTF-8 encoded as ASCII you will have to convert it yourself. It might seem the Cocoa string encoding methods would handle it, especially as what you appear to have is a string as it might be written in Objective-C source code. Unfortunately the obvious encoding, NSNonLossyAsciiStringEncoding only handles octal and unicode escapes, not the hexadecimal escapes in your string.
You can use any algorithm you like to convert it. One choice would be a simple finite state machine which scans the input a byte at a time and recognises the four byte sequence: \, x, hex-digit, hex-digit; and combines the two hex-digits into a single byte. NSString is not the best choice for byte-at-time string processing, you may be better off converting to C strings, e.g.:
// sample input, all characters should be ASCII
NSString *input = #"\\xC4\\x80";
// obtain a C string containing the ASCII characters
const char *cInput = [input cStringUsingEncoding:NSASCIIStringEncoding];
// allocate a buffer of the correct length for the result
char cOutput[strlen(c2a)+1];
// call your function to decode the hexadecimal escapes
convertAsciiEncodedUTF8(cInput, cOutput);
// create a NSString from the result
NSString *output = [NSString stringWithCString:cOutput encoding:NSUTF8StringEncoding];
You just need to write the finite state machine, or other algorithm, for convertAsciiEncodedUTF8.
(If you write an algorithm and it fails ask another question showing your code, somebody will probably help you. But don't expect someone to write it for you.)
HTH

NSString Decoding Problem

This String is base64 encoded string:
NSString *string=#"ë§ë ë¼ì´";
This is not show the orginal string:
NSLog(#"String is %#",[string cStringUsingEncoding:NSMacOSRomanStringEncoding]);

That's not a Base64-encoded string. There are a couple other things going on with your code, too:
You can't include literal non-ASCII characters inside a string constant; rather, you have to use the bytes that make up the character, prefixed with \x; or in the case of Unicode, you can use the Unicode code point, prefixed with \u. So your string should look something like NSString *string = #"\x91\xa4\x91 \x91\x93";. But...
The characters ¼ and ´ aren't part of the MacRoman encoding, so you'll have trouble using them. Are you sure you want a MacRoman string, rather than a Unicode string? Not many applications use MacRoman anymore, anyway.
cStringUsingEncoding: returns a C string, which should be printed with %s, not %#, since it's not an Objective-C object.
That said, your code will sort of work with:
// Using MacRoman encoding in string constant
NSString *s = #"\x91\xa4\x91 \x91\x93";
NSLog(#"%s", [s cStringUsingEncoding:NSMacOSRomanStringEncoding]);
I say "sort of work" because, again, you can't represent that code in MacRoman.

That would be because Mac OS Roman is nothing like base-64 encoding. Base-64 encoding is a further encoding applied the bytes that represent the original string. If you want to see the original string, you will first need to base-64 decode the bytestring and then figure out the original string encoding in order to interpret it.

Spanish characters replaced with weird string when 'stringWithFormat' is used?

NSString *myString = [NSString stringWithFormat:#"%#",BernabÈu];
NSLog(#"%#", myString);
Above statement prints:
Bernab\u00c8u
Here 'BernabÈu' is Spanish character string.
Why is the "\u00c8u" appended? How to get rid of it?

The because the '\u00c8' is the unicode representation of the E. I don't have the code handy, but you will have to look into using Locale's I think to get it to print with the correct character. But don't worry. Java still understands that this is an E.
(don't have the correct 'E' handy either :-)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas