Problem creating UTF8 text file with NSFileHandle - objective-c

I want to use NSFileHandle to write large text files to avoid handling very large NSString's in memory. I'm having a problem where after creating the file and opening it in the Text Edit app (Mac), it is not displaying the unicode characters correctly. If I write the same text to a file using the NSString writeToFile:atomically:encoding:error: method, Text Edit display everything correctly.
I'm opening both the files in Text Edit with the "opening files encoding" option set to automatic, so I'm not sure why one works and the other method doesn't. Is there some form of header to declare the format is UTF8?
// Standard string
NSString *myString = #"This is a test with a star character \u272d";
// This works fine
// Displays: "This is a test with a star character ✭" in Text Edit
[myString writeToFile:path atomically:YES encoding:NSUTF8StringEncoding];
// This doesn't work
// Displays: "This is a test with a star character ‚ú≠" in Text Edit
[fileManager createFileAtPath:path contents:nil attributes:nil];
fileHandle = [NSFileHandle fileHandleForWritingAtPath:path];
[fileHandle writeData:[myString dataUsingEncoding:NSUTF8StringEncoding]];

The problem is not with your code, but with TextEdit: It doesn't try to decode the file as UTF-8 unless it has a UTF-8 BOM identifying it as such. Presumably, the first version of your code adds such a BOM. See this question for further discussion.
UTF-8 data generally should not include a BOM, so you probably shouldn't modify your code from the second version at all—it's working correctly. If opening the file in TextEdit has to work, you should be able to force the BOM by including it (\ufeff) explicitly at the start of the string, but, again, you should not do that unless you really need to.

Related

Objc-DataFile-Unreadable Substring-Unknown to any encoding

I have a DataFile, built by subsidiairy Application. I need to locate some substring contained in the data file. They are identifiable by the character symbols delimiting them. For instance : *!substringqSxt .The substring will vary from a project to another so I need to locate the symbols delimiting them to read the following substring. I also printed the file to different encodings trying which one was used and matched the original data file. found it was MacOsRomanStringEncoding.
I use NSRange:rangOfStringto locate the delimiting symbols. Here is my code :
char *debutAudio ="jjbj";
char *finAudio ="qSxt";
NSString *debutAudioConverted = [[NSString alloc]
initWithCString: debutAudio
encoding:NSMacOSRomanStringEncoding];
NSString *finAudioConverted = [[NSString alloc]
initWithCString: finAudio
encoding:NSMacOSRomanStringEncoding];
NSRange debutaudioRange =[dataFileContent rangeOfString:debutAudioConverted];
NSRange finaudioRange =[dataFileContent rangeOfString:finAudioConverted];
NSLog(#"range is %#",NSStringFromRange(debutaudioRange));
NSLog(#"range is %#",NSStringFromRange(finaudioRange));
Both NSLog returns range is {9223372036854775807, 0}
so not locating the delimiting strings there.
And if I ask to look for other strings contained in the file like "Settings" the rangeOfString will return the proper location and length.
I thought the file may contain multiple encodings, and tried converting with initWithCStringto any possible encoding but nothing would do.
Also if I open the file in text edit and use the "Find" function, it will not locate the delimiting string, but will locate other words. My guts tell me its related. I dont know where to look for info. Could the file be protected, I am reading a copy of it though.
I have found the problem occuring here. The proper encoding is still MacOsRoman. The problem is the prefix string *debutAudio "jjbj"there is actually a tiny space , like a quarter space between each characters. I have tried every unicode spaces listed here :https://www.cs.tut.fi/~jkorpela/chars/spaces.html#adj
without any success. Now I will tried to find some half or quarter space under MacOsRoman see if that is working.

objective c UTF8String not working with japanese

I would like to show the NSString below on my UILabel:
NSString *strValue=#"你好";
but i can not show it on my UILabel i get strange characters!
I use this code to show the text:
[NSString stringWithCString:[strValue UTF8String] encoding:NSUTF8StringEncoding];
I tried [NSString stringWithCString:[strValue cStringUsingEncoding:NSISOLatin1StringEncoding] encoding:NSUTF8StringEncoding] and it worked
but i can not show emoticons with cStringUsingEncoding:NSISOLatin1StringEncoding so i have to use UTF8String.
Any help appreciated.
Your source file is in UTF-8, but the compiler you are using thinks it's ISO-Latin 1. What you think is the string #"你好" is actually the string #"你好". But when you ask NSString* to give you this back as ISO-Latin 1, and treat it as UTF-8, you've reversed the process the compiler took and you end up with the original string.
One solution that you can use here is to tell your compiler what encoding your source file is in. There is a compiler flag (for GCC it's -finput-charset=UTF-8, not sure about clang) that will tell the compiler what encoding to use. Curiously, UTF-8 should be the default already, but perhaps you're overriding this with a locale.
A more portable solution is to use only ASCII in your source file. You can accomplish this by replacing the non-ASCII chars with a string escape using \u1234 or \U12345678. In your case, you'd use
NSString *strValue=#"\u4F60\u597D";
Of course, once you get your string constant to be correct, you can ditch the whole encoding stuff and just use strValue directly.

NSUnicodeStringEncoding prepends FFFE to every string

I'm trying to append a string to a file by encoding it as NSUnicodeStringEncoding first. I'm doing this:
NSData *data = [#"data" dataUsingEncoding: NSUnicodeStringEncoding];
NSFileHandle *output = [NSFileHandle fileHandleForUpdatingAtPath:#"file"];
[output seekToEndOfFile];
[output writeData:data];
If I do this a number of times and then take a look at the file I notice that every string added has FFFE prepended to it. But when I switch from NSUnicodeStringEncoding to NSUTF8StringEncoding this prefix goes away.
That's called a byte-order marker, and is put there because NSUnicodeStringEncoding doesn't specify whether the characters are stored in big or little endian order.
To prevent 0xFFFE or 0xFEFF from appearing at the beginning of a string, use one of NSUTF16BigEndianStringEncoding, NSUTF16LittleEndianStringEncoding, NSUTF32BigEndianStringEncoding, or NSUTF32LittleEndianStringEncoding, depending on your specific needs. (For reference: Intel and ARM processors as used by Apple are little endian.)

rtf bullet character in objective c

Does anyone know if this line of code would work for a NSString from an rtf file on iOS?
NSString* cList = [[NSString alloc] initWithContentsOfFile:#"name of file" encoding:NSUTF8StringEncoding error:nil];
c = [cList componentsSeparatedByString:#"\n• "];
I'm just wondering since it includes a bullet point character which I pretty much copy pasted. I wasn't expecting it to be that easy. It just seems like it should be an escape sequence character or something.
Probably should've included some form of error checking in the first line, but that aside for the moment.
Update: After much compiling with no success with an rtf, I copied the text into a txt and used that instead. Works the first time. Seemed like the rtf reading was getting weird rtf data that wasn't really what I was after when I tried to NSLog it.
Thanks!
How about using unicode sequence?
like...
c = [cList componentsSeparatedByString:#"\n\u0000 "];
0000 <---- unicode.

How to read the contents of .doc file into string in XCode?

How do I read the contents of .doc file [not from resource file] into NSString in Objective-C?
I tried doing it in this way:
NSString *str = [NSString stringWithContentsOfFile:#"/User/home/Documents/config.doc"];
NSLog(#"Contents of file : %#",str);
OUTPUT:
-+-% [encoded format]
output is in encoded format
How do I solve this problem? Is it not reading from file the proper contents or am I printing it wrong?
Thank You.
The file you are using is probably binary, so when viewed as a string it will not work like you think. You would need some sort of library or decoding function that would parse and display a binary .doc file.
Though you may have more luck with a .docx file which I believe is an XML based format that Word can save.