unicode escapes in objective-c - objective-c

I have a string "Artîsté". I use json_encode from PHP on it and I get "Art\u00eest\u00e9".
How do I convert that to an NSString? I have tried many things and none of them work I always end up getting Artîsté
For Example:
NSString stringWithUTF8String:"Art\u00c3\u00aest\u00c3\u00a9"];//Artîsté
#"Art\u00c3\u00aest\u00c3\u00a9"; //Artîsté

You can use CFStringCreateFromExternalRepresentation with the kCFStringEncodingNonLossyASCII encoding to parse the \uXXXX escape sequences. Check out my answer here:
Converting escaped UTF8 characters back to their original form

The problem is your input string:
"Art\u00c3\u00aest\u00c3\u00a9"
does in fact literally mean "Artîsté". \u00c3 is 'Ã', \u00ae is '®', and \u00a9 is '©'.
Whatever is producing your input string is receiving UTF-8 input but expecting something else (e.g., cp1252, ISO-8859-1, or ISO-8859-15)

Related

Inserting string as regular string in mongodb

The pymongo documentation says that BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.
So I understand that to get rid of the Unicode literal 'u', I will have to call json.dumps() on the document returned by the query.
The documentation also says that Regular strings (<type ‘str’>) are validated and stored unaltered. And I am assuming that the query result also throws it back as a regular string and not a Unicode string.
I created a dictionary with regular string types and inserted it in DB and when I retrieve it, I get the strings as Unicode. Any idea on how do I do it? The purpose is to avoid calling json.dumps() on the query result. I need to fetch large number of documents from the DB and json.dumps() is taking quite some time. The strings that I am storing contain ASCII data so I don't need Unicode strings.
The assumption that the regular string is returned back as regular string was not correct. It is stored unaltered and not encoded to UTF-8 because it is already UTF-8. While decoding during the query, everything is converted back to Unicode.
Source:
Automatic string to unicode object conversion
How can I get pymongo to always return str and not unicode?

Whitespace encoding using stringByAddingPercentEscapesUsingEncoding

I am encoding white spaces in a string using
[#"iPhone Content.doc" stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]
in SKPSMTP message sending. But while receiving mail at attachments place I am getting the name iPhone%20Content.doc - instead of a space it shows %20. How can this be avoided / correctly encoded?
If you're doing stringByAddingPercentEscapesUsingEncoding then you're going to get percent signs in your result string... You can either use something different, or go back through and remove the percent signs later.
From the doc:
stringByAddingPercentEscapesUsingEncoding: Returns a representation of
the receiver using a given encoding to determine the percent escapes
necessary to convert the receiver into a legal URL string.
aka, "this method adds percent signs". If you want to reverse this process, use stringByReplacingPercentEscapesUsingEncoding
Just a side note, %20 is there because the hex representation of the space character is 20 and the % sign is an escape. You only need to do this for URLs, as they disallow the use of whitespace characters.
I got solution for my question. Actually am missed to set the "" to a string.
Of course the remote receiver can not accept the url with whitespace, so we must convert the URL address using the stringByAddingPercentEscapesUsingEncoding function.
This function replaces spaces in the URL expression with %20. It is especially useful when the URL contains non-ascii characters - you have use the function to percent-escape the URL so that the remote server can accept your request.

NSJSONSerialization parsng special characters

I am parsing some data using NSJSONSerialization. After parsing, I get strings like &auml ; and %#339; which i think has something to do with encoding. But NSJSONSerialzation doesn't ask for what encoding it requires, it i guess detects it by itself. So my question is, how can I get proper strings instead of these weird &auml ; and %#339;.
NSJSONSerialization assumes the encoding is one of the Unicode encodings. Make sure the data you pass to it is in UTF-8 (or UTF-16). ä is C3 A4 in UTF-8 or E4 in UTF-16.
Note that the default encoding for HTTP if none is specified is ISO-8859-1, so it may be that you are passing ISO-8859-1 data instead of UTF-8.
In options try NSJSONReadingMutableLeaves, it must return NSMutableString.. For more take a look at the docs.

utf8_decode for objective-c [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
unicode escapes in objective-c
I have a LATIN1 string.
Artîsté
When I json_encode it, it escapes some chars and converts it to single byte UTF8.
Art\u00eest\u00e9
If I just json_decode it, I believe it is decoding in UTF8
Artîsté
In order to get my original string back, I have to call utf8_decode
Artîsté
Is there a way to handle this conversion in objective-c?
You might be looking for this:
NSString *string = (some string with non-ASCII characters in it);
char const *string_as_latin1 = [string cStringUsingEncoding:NSISOLatin1StringEncoding];
or possibly this:
NSData *data_latin1 = [string dataUsingEncoding:NSISOLatin1StringEncoding allowLossyConversion:YES];
I have a LATIN1 string.
I don't think you do. Assuming you are talking about PHP, json_encode() only accepts UTF-8 strings, and bails out if it hits a non-UTF-8 high-byte sequence:
json_encode("Art\xeest\xe9")
"Art"
json_encode("Art\xc3\xaest\xc3\xa9")
"Art\u00eest\u00e9"
I think you had a proper UTF-8 string to start with, then you encoded and decoded it to get the exact same UTF-8 string back. But then you're displaying it or processing it in another step you haven't shown us, that treats your string as if it were Latin-1.

how to check the string is UNICODE vb.net

Is there any way to check if the string is UNICODE using VB.net.
Best Regards
inchikka
You need to read the file using the Encoding that the file is written in.
It appears to be a non Unicode file that you are trying to read as Unicode, or possibly a different Unicode encoding than the default UTF-8 (could be UTF-16 for example).
StreamWriter has several constructors that the an Encoding as parameter.
You can do it by validating each character in the string against the 128 characters in the ASCII table. If the character is not found there then it might be a unicode character.
Is that what you mean?