utf8_decode for objective-c [duplicate] - objective-c

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
unicode escapes in objective-c
I have a LATIN1 string.
Artîsté
When I json_encode it, it escapes some chars and converts it to single byte UTF8.
Art\u00eest\u00e9
If I just json_decode it, I believe it is decoding in UTF8
Artîsté
In order to get my original string back, I have to call utf8_decode
Artîsté
Is there a way to handle this conversion in objective-c?

You might be looking for this:
NSString *string = (some string with non-ASCII characters in it);
char const *string_as_latin1 = [string cStringUsingEncoding:NSISOLatin1StringEncoding];
or possibly this:
NSData *data_latin1 = [string dataUsingEncoding:NSISOLatin1StringEncoding allowLossyConversion:YES];

I have a LATIN1 string.
I don't think you do. Assuming you are talking about PHP, json_encode() only accepts UTF-8 strings, and bails out if it hits a non-UTF-8 high-byte sequence:
json_encode("Art\xeest\xe9")
"Art"
json_encode("Art\xc3\xaest\xc3\xa9")
"Art\u00eest\u00e9"
I think you had a proper UTF-8 string to start with, then you encoded and decoded it to get the exact same UTF-8 string back. But then you're displaying it or processing it in another step you haven't shown us, that treats your string as if it were Latin-1.

Related

Objective-C / C Convert UTF8 Literally to Real string

Im wondering how to convert
NSString = "\xC4"; ....
to real NSString represented in normal format
Fundamentally related to xcode UTF-8 literals. Of course, it is ambiguous what you actually mean by "\xC4" - without an encoding specified, it means nothing.
If you mean the character whose Unicode code point is 0x00C4 then I would think (though I haven't tested) that this will do what you want.
NSString *s = #"\u00C4";
First are you sure you have \xC4 in your string? Consider:
NSString *one = #"\xC4\x80";
NSString *two = #"\\xC4\\x80";
NSLog(#"%# | %#", one, two);
This will output:
Ā | \xC4\x80
If you are certain your string contains the four characters \xC4 are you sure it is UTF-8 encoded as ASCII? Above you will see I added \x80, this is because \xC4 is not valid UTF-8, it is the first byte of a two-byte sequence. Maybe you have only shown a sample of your input and the second byte is present, if not you do not have UTF-8 encoded as ASCII.
If you are certain it is UTF-8 encoded as ASCII you will have to convert it yourself. It might seem the Cocoa string encoding methods would handle it, especially as what you appear to have is a string as it might be written in Objective-C source code. Unfortunately the obvious encoding, NSNonLossyAsciiStringEncoding only handles octal and unicode escapes, not the hexadecimal escapes in your string.
You can use any algorithm you like to convert it. One choice would be a simple finite state machine which scans the input a byte at a time and recognises the four byte sequence: \, x, hex-digit, hex-digit; and combines the two hex-digits into a single byte. NSString is not the best choice for byte-at-time string processing, you may be better off converting to C strings, e.g.:
// sample input, all characters should be ASCII
NSString *input = #"\\xC4\\x80";
// obtain a C string containing the ASCII characters
const char *cInput = [input cStringUsingEncoding:NSASCIIStringEncoding];
// allocate a buffer of the correct length for the result
char cOutput[strlen(c2a)+1];
// call your function to decode the hexadecimal escapes
convertAsciiEncodedUTF8(cInput, cOutput);
// create a NSString from the result
NSString *output = [NSString stringWithCString:cOutput encoding:NSUTF8StringEncoding];
You just need to write the finite state machine, or other algorithm, for convertAsciiEncodedUTF8.
(If you write an algorithm and it fails ask another question showing your code, somebody will probably help you. But don't expect someone to write it for you.)
HTH

How to put string into string after specific string

I am new in programming. I have string NSString *string = #"\U0420\U043e\U0437\U044b"; and after each slash('\') i need put another slash to get string like this #"\\U0420\\U043e\\U0437\\U044b"
I am new to programming and objective-c. please help.
My original answer was:
Use [NSString stringByReplacingOccurrencesOfString:withString:] (reference).
NSString *string = #"\U0420\U043e\U0437\U044b";
NSString *converted = [string stringByReplacingOccurrencesOfString:#"\\"
withString:#"\\\\\\"];
However I now don't think that's right given the \ characters won't actually exist in string; instead the compiler will convert each of those sequences into a unicode character. You will need to encode string as this:
NSString *string = #"\\U0420\\U043e\\U0437\\U044b";
In order to use the above code. I cannot see any alternative to this.
Further Update: Often when I've come across questions like this there is a confusion between string literals and string data. In your question those \ characters won't appear as the compiler will have converted them into unicode characters (\Uxxx is a unicode escape sequence for a single character). However if you provided a string like that at runtime (say read from a text file) then those \ characters will exist and you can use the code above.

unicode escapes in objective-c

I have a string "Artîsté". I use json_encode from PHP on it and I get "Art\u00eest\u00e9".
How do I convert that to an NSString? I have tried many things and none of them work I always end up getting Artîsté
For Example:
NSString stringWithUTF8String:"Art\u00c3\u00aest\u00c3\u00a9"];//Artîsté
#"Art\u00c3\u00aest\u00c3\u00a9"; //Artîsté
You can use CFStringCreateFromExternalRepresentation with the kCFStringEncodingNonLossyASCII encoding to parse the \uXXXX escape sequences. Check out my answer here:
Converting escaped UTF8 characters back to their original form
The problem is your input string:
"Art\u00c3\u00aest\u00c3\u00a9"
does in fact literally mean "Artîsté". \u00c3 is 'Ã', \u00ae is '®', and \u00a9 is '©'.
Whatever is producing your input string is receiving UTF-8 input but expecting something else (e.g., cp1252, ISO-8859-1, or ISO-8859-15)

Xcode Sqlite Encoding Turkish characters

We have records like 'GÜLHAN', 'Yılan', 'çekiç' in our Sqlite database.
These words include Turkish characters and the problem is that we can not read these words correctly, for example; we read 'GEDf∞k' instead of 'GEDİK'.
How can we solve this sqlite reading problem in xcode?
What encoding did you use to store data in DB? Should not be any problems if it's UTF8.
char *data = (char *) sqlite3_column_text (stmt, 1);
NSString *string = [NSString stringWithUTF8String:data];
If this gives you unexpected results, then it's not UTF8 and it's probably a good idea to re-encode everything in DB to UTF8 first.

NSString Decoding Problem

This String is base64 encoded string:
NSString *string=#"ë§ë ë¼ì´";
This is not show the orginal string:
NSLog(#"String is %#",[string cStringUsingEncoding:NSMacOSRomanStringEncoding]);
That's not a Base64-encoded string. There are a couple other things going on with your code, too:
You can't include literal non-ASCII characters inside a string constant; rather, you have to use the bytes that make up the character, prefixed with \x; or in the case of Unicode, you can use the Unicode code point, prefixed with \u. So your string should look something like NSString *string = #"\x91\xa4\x91 \x91\x93";. But...
The characters ¼ and ´ aren't part of the MacRoman encoding, so you'll have trouble using them. Are you sure you want a MacRoman string, rather than a Unicode string? Not many applications use MacRoman anymore, anyway.
cStringUsingEncoding: returns a C string, which should be printed with %s, not %#, since it's not an Objective-C object.
That said, your code will sort of work with:
// Using MacRoman encoding in string constant
NSString *s = #"\x91\xa4\x91 \x91\x93";
NSLog(#"%s", [s cStringUsingEncoding:NSMacOSRomanStringEncoding]);
I say "sort of work" because, again, you can't represent that code in MacRoman.
That would be because Mac OS Roman is nothing like base-64 encoding. Base-64 encoding is a further encoding applied the bytes that represent the original string. If you want to see the original string, you will first need to base-64 decode the bytestring and then figure out the original string encoding in order to interpret it.