objective c - does not read utf-8 encoded file - objective-c

I'm trying to display some japanese text on the ios simulator and an ipod touch. The text is read from an XML file. The header is:
<?xml version="1.0" encoding="utf-8"?>
When the text is in english, it displays fine. However, when the text is Japanese, it comes out as an unintelligible mishmash of single-byte characters.
I have tried saving the file specifically as unicode using TextEdit. I'm using NSXMLParser to parse the data. Any ideas would be much appreciated.
Here is the parsing code
// Override point for customization after application launch.
NSString *xmlFilePath = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:#"questionsutf8.xml"];
NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath];
NSData *data = [NSData dataWithBytes:[xmlFileContents UTF8String] length:[xmlFileContents lengthOfBytesUsingEncoding: NSUTF8StringEncoding]];
XMLReader *xmlReader = [[XMLReader alloc] init];
[xmlReader parseXMLData: data];

stringWithContentsOfFile: is a deprecated method. It does not do encoding detection unless the file contains the appropriate byte order mark, otherwise it interprets the file as the default C string encoding (the encoding returned by the +defaultCStringEncoding method). Instead, you should use the non-deprecated [and encoding-detecting] method stringWithContentsOfFile:usedEncoding:error:.
You can use it like this:
NSStringEncoding enc;
NSError *error;
NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath
usedEncoding:&enc
error:&error];
if (xmlFileContents == nil)
{
NSLog (#"%#", error);
return;
}

First, you should verify with TextWrangler (free from the Mac app store or barebones.com) that your XML file truly is UTF-8 encoded.
Second, try creating xmlFileContents with +stringWithContentsOfFile:encoding:error:, explicitly specifying UTF-8 encoding. Or, even better, bypass the intermediate string entirely, and create data with +dataWithContentsOfFile:.

Related

Replace %20 with space character while saving file iOS

I am now recording audio file and saving to document directory. The given name contain space character. When I save to document directory, space characters are changed to %20. I would like to know how to save properly so that that audio file name contain space character.
NSString *myDBnew = [documentsDirectory stringByAppendingPathComponent:recordingAudioName ];
NSURL *recordedTmpFile = [NSURL fileURLWithPath:myDBnew];
Edited - This is to do audio recording and save to local directory.
recorder = [[ AVAudioRecorder alloc] initWithURL:recordedTmpFile settings:recordSetting error:&error];
You're using NSURL and spaces are invalid characters in a URL. That's why it's converted to a %20.
please use
-(BOOL)writeToFile:options:error:
instead of
-(BOOL)writeToURL:options:error:.
writeToFile takes a NSString as argument (not a NSURL). The spaces are preserved this way.
NSString *myDBnew = [documentsDirectory stringByAppendingPathComponent:recordingAudioName];
NSerror *error;
if (![data writeToFile:myDBnew options:0 error:&error]) {
// Error handling
}

Objective-C: NSString not being entirely decoded from UTF-8

I'm querying a web server which returns a JSON string as NSData. The string is in UTF-8 format so it is converted to an NSString like this.
NSString *receivedString = [[NSString alloc] initWithData:receivedData encoding:NSUTF8StringEncoding];
However, some UTF-8 escapes remain in the outputted JSON string which causes my app to behave erratically. Things like \u2019 remain in the string. I've tried everything to remove them and replace them with their actual characters.
The only thing I can think of is to replace the occurances of UTF-8 escapes with their characters manually, but this is a lot of work if there's a quicker way!
Here's an example of an incorrectly parsed string:
{"title":"The Concept, Framed, The Enquiry, Delilah\u2019s Number 10 ","url":"http://livebrum.co.uk/2012/05/31/the-concept-framed-the-enquiry-delilah\u2019s-number-10","date_range":"31 May 2012","description":"","venue":{"title":"O2 Academy 3 ","url":"http://livebrum.co.uk/venues/o2-academy-3"}
As you can see, the URL hasn't been completely converted.
Thanks,
The \u2019 syntax isn't part of UTF-8 encoding, it's a piece of JSON-specific syntax. NSString parses UTF-8, not JSON, so doesn't understand it.
You should use NSJSONSerialization to parse the JSON then pull the string you want from the output of that.
So, for example:
NSError *error = nil;
id rootObject = [NSJSONSerialization
JSONObjectWithData:receivedData
options:0
error:&error];
if(error)
{
// error path here
}
// really you'd validate this properly, but this is just
// an example so I'm going to assume:
//
// (1) the root object is a dictionary;
// (2) it has a string in it named 'url'
//
// (technically this code will work not matter what the type
// of the url object as written, but if you carry forward assuming
// a string then you could be in trouble)
NSDictionary *rootDictionary = rootObject;
NSString *url = [rootDictionary objectForKey:#"url"];
NSLog(#"URL was: %#", url);

Is there a way to "auto detect" the encoding of a resource when loading it using stringFromContentsOfURL?

Is there a way to "auto detect" the encoding of a resource when loading it using stringFromContentsOfURL? The current (non-depracated) method, + (id)stringWithContentsOfURL:(NSURL *)url encoding:(NSStringEncoding)enc error:(NSError **)error;, wants a URL encoding. I've noticed that getting it wrong does make a difference for what I want to do. Is there a way to check this somehow and always get it right? (Right now I'm using UTF8.)
I'd try this function from the docs
Returns a string created by reading data from a given URL and returns by reference the encoding used to interpret the data.
+ (id)stringWithContentsOfURL:(NSURL *)url usedEncoding:(NSStringEncoding *)enc error:(NSError **)error
this seems to guess the encoding and then returns it to you
What I normally do when converting data (encoding-less string of bytes) to a string is attempt to initialize the string using various different encodings. I would suggest trying the most limiting (charset wise) encodings like ASCII and UTF-8 first, then attempt UTF-16. If none of those are a valid encoding, you should attempt to decode the string using a fallback encoding like NSWindowsCP1252StringEncoding that will almost always work. In order to do this you need to download the page's contents using NSData so that you don't have to re-download for every encoding attempt. Your code might look like this:
NSData * urlData = [NSData dataWithContentsOfURL:aURL];
NSString * theString = [[NSString alloc] initWithData:urlData encoding:NSASCIIStringEncoding];
if (!theString) {
theString = [[NSString alloc] initWithData:urlData encoding:NSUTF8StringEncoding];
}
if (!theString) {
theString = [[NSString alloc] initWithData:urlData encoding:NSUTF16StringEncoding];
}
if (!theString) {
theString = [[NSString alloc] initWithData:urlData NSWindowsCP1252StringEncoding];
}
// ...
// use theString here...
// ...
[theString release];

Unable to retrieve certain pages using stringWithContentsOfURL

I am trying to get HTML files from the web, using stringWithContentsOfURL:. My problem is, sometimes it works but sometimes it doesn't. For example, I tried:
NSString *string = [NSString stringWithContentsOfURL:
[NSURL URLWithString:#"http://www.google.com/"]
encoding:encoding1
error:nil];
NSLog(#"html = %#",string);
This works fine, but when I replace the URL with #"http://www.youtube.com/" then I only get "NULL". Is there anyone that knows what's going on? Is it because of YouTube having some sort of protection?
Google's home page uses ISO-8859-1 encoding (aka "Latin-1", or NSISOLatin1StringEncoding). YouTube uses UTF-8 (NSUTF8StringEncoding), and the encoding you've specified with your encoding1 variable has to match the web page in question.
If you just want the web page and don't really care what encoding it's in, try this:
NSStringEncoding encoding;
NSError *error;
NSString *string = [NSString stringWithContentsOfURL:
[NSURL URLWithString:#"http://www.google.com/"]
usedEncoding:&encoding
error:&error];
NSLog(#"html = %#",string);
This method will tell you what the encoding was (by writing it to the encoding variable), but you can just throw that away and focus on the string.

Text encoding problem between NSImage, NSData, and NSXMLDocument

I'm attempting to take an NSImage and convert it to a string which I can write in an XML document.
My current attempt looks something like this:
[xmlDocument setCharacterEncoding: #"US-ASCII"];
NSData* data = [image TIFFRepresentation];
NSString* string = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
//Put string inside of NSXMLElement, write out NSXMLDocument.
Reading back in looks something like this:
NSXMLDocument* newXMLDocument = [[NSXMLDocument alloc] initWithData:data options:0 error:outError];
//Here's where it fails. I get:
//Error Domain=NSXMLParserErrorDomain Code=9 UserInfo=0x100195310 "Line 7: Char 0x0 out of allowed range"
I assume I'm missing something basic. What's up with this encoding issue?
First of all, embedding large amounts of binary data in XML is not a good idea, IMHO.
To answer your question, you need an encoding scheme that supports binary data, such as Base64.
See this page for more than one way to represent arbitrary NSData as a Base64-encoded string: http://www.cocoadev.com/index.pl?BaseSixtyFour
UPDATE: The link to Colloquy's NSData additions seems to be broken on that page. Here's the new URL: http://colloquy.info/project/browser/trunk/Additions/NSDataAdditions.m