Objective-C: NSString not being entirely decoded from UTF-8 - objective-c

I'm querying a web server which returns a JSON string as NSData. The string is in UTF-8 format so it is converted to an NSString like this.
NSString *receivedString = [[NSString alloc] initWithData:receivedData encoding:NSUTF8StringEncoding];
However, some UTF-8 escapes remain in the outputted JSON string which causes my app to behave erratically. Things like \u2019 remain in the string. I've tried everything to remove them and replace them with their actual characters.
The only thing I can think of is to replace the occurances of UTF-8 escapes with their characters manually, but this is a lot of work if there's a quicker way!
Here's an example of an incorrectly parsed string:
{"title":"The Concept, Framed, The Enquiry, Delilah\u2019s Number 10 ","url":"http://livebrum.co.uk/2012/05/31/the-concept-framed-the-enquiry-delilah\u2019s-number-10","date_range":"31 May 2012","description":"","venue":{"title":"O2 Academy 3 ","url":"http://livebrum.co.uk/venues/o2-academy-3"}
As you can see, the URL hasn't been completely converted.
Thanks,

The \u2019 syntax isn't part of UTF-8 encoding, it's a piece of JSON-specific syntax. NSString parses UTF-8, not JSON, so doesn't understand it.
You should use NSJSONSerialization to parse the JSON then pull the string you want from the output of that.
So, for example:
NSError *error = nil;
id rootObject = [NSJSONSerialization
JSONObjectWithData:receivedData
options:0
error:&error];
if(error)
{
// error path here
}
// really you'd validate this properly, but this is just
// an example so I'm going to assume:
//
// (1) the root object is a dictionary;
// (2) it has a string in it named 'url'
//
// (technically this code will work not matter what the type
// of the url object as written, but if you carry forward assuming
// a string then you could be in trouble)
NSDictionary *rootDictionary = rootObject;
NSString *url = [rootDictionary objectForKey:#"url"];
NSLog(#"URL was: %#", url);

Related

Is there a way to "auto detect" the encoding of a resource when loading it using stringFromContentsOfURL?

Is there a way to "auto detect" the encoding of a resource when loading it using stringFromContentsOfURL? The current (non-depracated) method, + (id)stringWithContentsOfURL:(NSURL *)url encoding:(NSStringEncoding)enc error:(NSError **)error;, wants a URL encoding. I've noticed that getting it wrong does make a difference for what I want to do. Is there a way to check this somehow and always get it right? (Right now I'm using UTF8.)
I'd try this function from the docs
Returns a string created by reading data from a given URL and returns by reference the encoding used to interpret the data.
+ (id)stringWithContentsOfURL:(NSURL *)url usedEncoding:(NSStringEncoding *)enc error:(NSError **)error
this seems to guess the encoding and then returns it to you
What I normally do when converting data (encoding-less string of bytes) to a string is attempt to initialize the string using various different encodings. I would suggest trying the most limiting (charset wise) encodings like ASCII and UTF-8 first, then attempt UTF-16. If none of those are a valid encoding, you should attempt to decode the string using a fallback encoding like NSWindowsCP1252StringEncoding that will almost always work. In order to do this you need to download the page's contents using NSData so that you don't have to re-download for every encoding attempt. Your code might look like this:
NSData * urlData = [NSData dataWithContentsOfURL:aURL];
NSString * theString = [[NSString alloc] initWithData:urlData encoding:NSASCIIStringEncoding];
if (!theString) {
theString = [[NSString alloc] initWithData:urlData encoding:NSUTF8StringEncoding];
}
if (!theString) {
theString = [[NSString alloc] initWithData:urlData encoding:NSUTF16StringEncoding];
}
if (!theString) {
theString = [[NSString alloc] initWithData:urlData NSWindowsCP1252StringEncoding];
}
// ...
// use theString here...
// ...
[theString release];

Unable to retrieve certain pages using stringWithContentsOfURL

I am trying to get HTML files from the web, using stringWithContentsOfURL:. My problem is, sometimes it works but sometimes it doesn't. For example, I tried:
NSString *string = [NSString stringWithContentsOfURL:
[NSURL URLWithString:#"http://www.google.com/"]
encoding:encoding1
error:nil];
NSLog(#"html = %#",string);
This works fine, but when I replace the URL with #"http://www.youtube.com/" then I only get "NULL". Is there anyone that knows what's going on? Is it because of YouTube having some sort of protection?
Google's home page uses ISO-8859-1 encoding (aka "Latin-1", or NSISOLatin1StringEncoding). YouTube uses UTF-8 (NSUTF8StringEncoding), and the encoding you've specified with your encoding1 variable has to match the web page in question.
If you just want the web page and don't really care what encoding it's in, try this:
NSStringEncoding encoding;
NSError *error;
NSString *string = [NSString stringWithContentsOfURL:
[NSURL URLWithString:#"http://www.google.com/"]
usedEncoding:&encoding
error:&error];
NSLog(#"html = %#",string);
This method will tell you what the encoding was (by writing it to the encoding variable), but you can just throw that away and focus on the string.

Read and write an integer to/from a .txt file

How can I read and write an integer to and from a text file, and is it possible to read or write to multiple lines, i.e., deal with multiple integers?
Thanks.
This is certainly possible; it simply depends on the exact format of the text file.
Reading the contents of a text file is easy:
// If you want to handle an error, don't pass NULL to the following code, but rather an NSError pointer.
NSString *contents = [NSString stringWithContentsOfFile:#"/path/to/file" encoding:NSUTF8StringEncoding error:NULL];
That creates an autoreleased string containing the entire file. If all the file contains is an integer, you can just write this:
NSInteger integer = [contents integerValue];
If the file is split up into multiple lines (with each line containing one integer), you'll have to split it up:
NSArray *lines = [contents componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]];
for (NSString *line in lines) {
NSInteger currentInteger = [line integerValue];
// Do something with the integer.
}
Overall, it's very simple.
Writing back to a file is just as easy. Once you've manipulated what you wanted back into a string, you can just use this:
NSString *newContents = ...; // New string.
[newContents writeToFile:#"/path/to/file" atomically:YES encoding:NSUTF8StringEncoding error:NULL];
You can use that to write to a string. Of course, you can play with the settings. Setting atomically to YES causes it to write to a test file first, verify it, and then copy it over to replace the old file (this ensures that if some failure happens, you won't end up with a corrupt file). If you want, you can use a different encoding (though NSUTF8StringEncoding is highly recommended), and if you want to catch errors (which you should, essentially), you can pass in a reference to an NSError to the method. It would look something like this:
NSError *error = nil;
[newContents writeToFile:#"someFile.txt" atomically:YES encoding:NSUTF8StringEncoding error:&error];
if (error) {
// Some error has occurred. Handle it.
}
For further reading, consult the NSString Class Reference.
If you have to write to multiple lines, use \r\n when building the newContents string to specify where line breaks are to be placed.
NSMutableString *newContents = [[NSMutableString alloc] init];
for (/* loop conditions here */)
{
NSString *lineString = //...do stuff to put important info for this line...
[newContents appendString:lineString];
[newContents appendString:#"\r\n"];
}

objective c - does not read utf-8 encoded file

I'm trying to display some japanese text on the ios simulator and an ipod touch. The text is read from an XML file. The header is:
<?xml version="1.0" encoding="utf-8"?>
When the text is in english, it displays fine. However, when the text is Japanese, it comes out as an unintelligible mishmash of single-byte characters.
I have tried saving the file specifically as unicode using TextEdit. I'm using NSXMLParser to parse the data. Any ideas would be much appreciated.
Here is the parsing code
// Override point for customization after application launch.
NSString *xmlFilePath = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:#"questionsutf8.xml"];
NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath];
NSData *data = [NSData dataWithBytes:[xmlFileContents UTF8String] length:[xmlFileContents lengthOfBytesUsingEncoding: NSUTF8StringEncoding]];
XMLReader *xmlReader = [[XMLReader alloc] init];
[xmlReader parseXMLData: data];
stringWithContentsOfFile: is a deprecated method. It does not do encoding detection unless the file contains the appropriate byte order mark, otherwise it interprets the file as the default C string encoding (the encoding returned by the +defaultCStringEncoding method). Instead, you should use the non-deprecated [and encoding-detecting] method stringWithContentsOfFile:usedEncoding:error:.
You can use it like this:
NSStringEncoding enc;
NSError *error;
NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath
usedEncoding:&enc
error:&error];
if (xmlFileContents == nil)
{
NSLog (#"%#", error);
return;
}
First, you should verify with TextWrangler (free from the Mac app store or barebones.com) that your XML file truly is UTF-8 encoded.
Second, try creating xmlFileContents with +stringWithContentsOfFile:encoding:error:, explicitly specifying UTF-8 encoding. Or, even better, bypass the intermediate string entirely, and create data with +dataWithContentsOfFile:.

Text encoding problem between NSImage, NSData, and NSXMLDocument

I'm attempting to take an NSImage and convert it to a string which I can write in an XML document.
My current attempt looks something like this:
[xmlDocument setCharacterEncoding: #"US-ASCII"];
NSData* data = [image TIFFRepresentation];
NSString* string = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
//Put string inside of NSXMLElement, write out NSXMLDocument.
Reading back in looks something like this:
NSXMLDocument* newXMLDocument = [[NSXMLDocument alloc] initWithData:data options:0 error:outError];
//Here's where it fails. I get:
//Error Domain=NSXMLParserErrorDomain Code=9 UserInfo=0x100195310 "Line 7: Char 0x0 out of allowed range"
I assume I'm missing something basic. What's up with this encoding issue?
First of all, embedding large amounts of binary data in XML is not a good idea, IMHO.
To answer your question, you need an encoding scheme that supports binary data, such as Base64.
See this page for more than one way to represent arbitrary NSData as a Base64-encoded string: http://www.cocoadev.com/index.pl?BaseSixtyFour
UPDATE: The link to Colloquy's NSData additions seems to be broken on that page. Here's the new URL: http://colloquy.info/project/browser/trunk/Additions/NSDataAdditions.m