Parsing a binary MOBI file: best approach? - objective-c

It contains METADATA in between binary data. I'm able to parse the first line with the title Agent_of_Chang2e, but I need to get the metadata on the bottom of the header as well. I know there are not standard specifics for it.
This code isn't able to decode the bottom lines. For example I get the following wrong formatted text:
FÃHANGE</b1èrX)¯­ÌiadenÕniverse<sup><smalÀ|®¿8</¡Îovelÿ·?=SharonÌeeándÓteveÍiller8PblockquoteßßÚ>TIa÷orkyfiction.Áll#eãacÐ0hðortrayedén{n)áreïrzus0¢°usly.Ôhatíean0authhmxétlõp.7N_\
©ß© 1988âyÓOOKãsòeserved.0ðart)publicaZmayâehproduc
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
char buffer[1024];
FILE* file = fopen([path UTF8String], "r");
if (file != 0)
{
while(fgets(buffer, 1024, file) != NULL)
{
NSString* string = [[NSString alloc] initWithCString: buffer encoding:NSASCIIStringEncoding];
NSLog(#"%#",string);
[string release];
}
fclose(file);
}
[pool drain];

nielsbot already posted a link to the format specification.
As you can read there, the file is not text file, but binary encoded. Parsing it with NSString instances is no good idea.
You have to read the file binary, i. e. using NSData:
NSData content = [NSData dataWithContentsOfFile:path];
Then you have to take out the relevant information by yourself. For example, if you want to read the uncompressed text length, you will find in the linked document that this information starts at position 4 and has a length of 4.
int32_t uncompressedTextLength; // 4 bytes are 32 bit.
[content getBytes:&uncompressedLenght range:NSMakeRange(4, 4)];
Maybe you have to deal with endianess.

Use NSTask or system() to pass the file through the strings utility and parse the output of that:
strings /bin/bash | more
...
...
677778899999999999999999999999999999:::;;<<====>>>>>>>>>>>????????
#(#)PROGRAM:bash PROJECT:bash-92
...
...

First, I am pretty sure the texts will be UTF-8 or UTF-16 encoded.
Second, you cannot just take random 1024 bytes and expect them to work as a text. What about byte order (big endian vs little endian)?

Related

Create file with fixed size in Cocoa

My MAC app need create file that has fixed size. Example : I want to create file has name : test.txt, fixed size : 1069 bytes. How can i do that? I use below code to write file :
NSError *err;
NSString* arrayText = [writeArray componentsJoinedByString: #"\n"];
[filemgr createFileAtPath:[NSString stringWithFormat:#"%#/test.txt",sd_url] contents:nil attributes:nil];
[arrayText writeToFile:[NSString stringWithFormat:#"%#/test.txt",sd_url] atomically:YES encoding:NSUTF8StringEncoding error:&err];
Thanks
This function will write some junk data to file.
+(BOOL)writeToFile:(NSString *)path withSize:(size_t)bytes;
{
FILE *file = fopen([path UTF8String], "wb");
if(file == NULL)
return NO;
void *data = malloc(bytes); // check for NULL!
if (data==NULL) {
return NO;
}
fwrite(data, 1, bytes, file);
fclose(file);
return YES;
}
If you dont want to use fwrite
+(BOOL)writeToFile:(NSString *)path withSize:(size_t)bytes;
{
void *data = malloc(bytes); // check for NULL!
if (data==NULL) {
return NO;
}
NSData *ldata = [NSData dataWithBytes:data length:bytes];
[ldata writeToFile:path atomically:NO];
return YES;
}
In order for the file to contain a certain number of bytes, you must write that many bytes to the file. There's no way to make a file's size be something other than the number of bytes it contains.
You can use FSAllocateFork (or fcntl with F_PREALLOCATE) to reserve space to write into. You'd use this, for example, if you were implementing your own download logic (if, for some reason, NSURLDownload wasn't good enough) and wanted to (a) make sure enough space is available for the file you're downloading and (b) grab it before something else does.
But, even that doesn't actually change the size of the file, just ensures (if successful) that your writes will not fail for insufficient space.
The only way to truly grow a file is to write to it.
The best way to do that is to use either -[NSData writeToURL:options:error:], which is the easy way if you have the data all ready at once, or NSFileHandle, which will enable you to write the data in chunks rather than having to build it all up in memory first.

AVAudioPlayer refuses to play modified wav file

The first time i call this method file1 will be nil and file2 will be returned. When this hapens the file will play normally (so the calling of this method should be fine). But when i call it for the second time it will return an NSURL which the AVAudioPlayer does not play. My guess is I have missed something in the header. In the debugging mode i have seen that the totalLength is exactly as long as the data's length.
+(NSURL *)mergeFile1:(NSURL *)file1 withFile2:(NSURL *)file2 {
if(file1 == nil) {
return [file2 copy];
}
NSData * wav1Data = [NSData dataWithContentsOfURL:file1];
NSData * wav2Data = [NSData dataWithContentsOfURL:file2];
int wav1DataSize = [wav1Data length] - 46;
int wav2DataSize = [wav2Data length] - 46;
if (wav1DataSize <= 0 || wav2DataSize <= 0) {
return nil;
}
NSMutableData * soundFileData = [NSMutableData dataWithData:[wav1Data subdataWithRange:NSMakeRange(0, 46)]];
[soundFileData appendData:[wav1Data subdataWithRange:NSMakeRange(46, wav1DataSize)]];
[soundFileData appendData:[wav2Data subdataWithRange:NSMakeRange(46, wav2DataSize)]];
unsigned int totalLength = [soundFileData length];
NSLog(#"Calculated: %d - Real: %d", totalLength, [soundFileData length]);
[soundFileData replaceBytesInRange:NSMakeRange(4, 4)
withBytes:&(UInt32){NSSwapHostIntToLittle(totalLength-8)}];
[soundFileData replaceBytesInRange:NSMakeRange(42, 4)
withBytes:&(UInt32){NSSwapHostIntToLittle(totalLength)}];
[soundFileData writeToURL:file1 atomically:YES];
return [file1 copy];
}
If anyone sees something that can be of help it would be much appreciated!
Any questions will be answered asap.
EDIT
I know there are 2 sorts of wav headers: 44 bytes or 46 bytes. I have tried both.
EDIT
I have looked at the Audio File Services Reference which contains a lot of nice stuff i might want to use, but i can't figure out how to use all this. I'm not really known with c. Hope anyone could help me out with this.
EDIT
An example of a merged wav file is found here: 7--443522512
Looks like your WAV file includes a broken FLLR chunk before the data chunk, or at least VLC thinks the FLLR chunk is over 2GB large so it tries to skip to the next chunk which is beyond the file end.
Maybe you should try to create WAV files without FLLR chunk before merging, the kAudioFileFlags_DontPageAlignAudioData seams to make Audio File Services skip it.
Another option is to extract the data chunks and write a new wav file, a did a proof of concept implementation here: https://gist.github.com/1555889

Objective-C: Reading contents of a file into an NSString object doesn't convert unicode

I have a file, which I'm reading into an NSString object using stringWithContentsOfFile. It contains Unicode for Japanese characters such as:
\u305b\u3044\u3075\u304f
which I believe is
せいふく
I would like my NSString object to store the string as the latter, but it is storing it as the former.
The thing I don't quite understand is that when I do this:
NSString *myString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];
It stores it as: \u305b\u3044\u3075\u304f.
But when I hardcode in the string:
NSString *myString = #"\u305b\u3044\u3075\u304f";
It correctly converts it and stores it as: せいふく
Does stringWIthContentsOfFile escape the Unicode in some way? Any help will be appreciated.
Thanks.
In the file \u305b\u3044\u3075\u304f are just normal characters. So you are getting them in string. You need to save actual Japanese characters in the file. That is, store せいふく in file and that will be loaded in the string.
You can try this, dont know how feasible it is..
NSArray *unicodeArray = [stringFromFile componentsSeparatedByString:#"\\u"];
NSMutableString *finalString = [[NSMutableString alloc] initWithString:#""];
for (NSString *unicodeString in unicodeArray) {
if (![unicodeString isEqualToString:#""]) {
unichar codeValue;
[[NSScanner scannerWithString:unicodeString] scanHexInt:&codeValue];
NSString* betaString = [NSString stringWithCharacters:&codeValue length:1];
[finalString appendString:betaString];
}
}
//finalString should have せいふく
Something like \u305b in an Objective-C string is in fact an instruction to the compiler to replace it with the actual UTF-8 byte sequence for that character. The method reading the file is not a compiler, and only reads the bytes it finds. So to get that character (officially called "code point"), your file must contain the actual UTF-8 byte sequence for that character, and not the symbolic representation \u305b.
It's a bit like \x43. This is, in your source code, four characters, but it is replaced by one byte with value 0x43. So if you write #"\x43" to a file, the file will not contain the four characters '\', 'x', '4', '3', it will contain the single character 'C' (which has ASCII value 0x43).

Decoding word-encoded Content-Disposition header file name in Objective-C

I am trying to retrieve a file name that can't be represented in ASCII from the content-disposition header.
This file name is word-encoded. Below is the encoded file name:
=?UTF-8?Q?=C3=ABst=C3=A9_=C3=A9_=C3=BAm_n=C3=B4m=C3=A9?= =?UTF-8?Q?_a=C3=A7ent=C3=BAad=C3=B5.xlsx?=
How do I get the decoded file name (that actually is "ësté é úm nômé açentúadõ.xlsx")?
PS: I am looking for an Objective-C implementation.
You probably want to search for a MIME handling framework, but I searched online and came up with nothing, so....
I couldn't find an example online, so I'm just showing the algorithm here. It's not the best example since I'm making a big assumption. That being that the string is always UTF-8 Q-encoded.
Q-encoding is like URL-encoding (percent-encoding), which Foundation's NSString already has support for decoding. The only (practical) difference when decoding (there are bigger differences when encoding) is that % encodings are = encodings instead.
Then there's the lead-in and lead-out stuff. Each encoded block has the format =?charset-name?encoding-type? ... encoded string here ... ?=. You should really read the charset name is use that encoding, and you should really read the encoding-type, since it may be "Q" or "B" (Base64).
This example only works for Q-encoding (a subset of quoted-printable). You should be able to easily modify it to handle the different charsets and to handle Base64 encoding however.
#import <Foundation/Foundation.h>
int main(void) {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
NSString *encodedString = #"=?UTF-8?Q?=C3=ABst=C3=A9_=C3=A9_=C3=BAm_n=C3=B4m=C3=A9?= =?UTF-8?Q?_a=C3=A7ent=C3=BAad=C3=B5.xlsx?=";
NSScanner *scanner = [NSScanner scannerWithString:encodedString];
NSString *buf = nil;
NSMutableString *decodedString = [[NSMutableString alloc] init];
while ([scanner scanString:#"=?UTF-8?Q?" intoString:NULL]
|| ([scanner scanUpToString:#"=?UTF-8?Q?" intoString:&buf] && [scanner scanString:#"=?UTF-8?Q?" intoString:NULL])) {
if (buf != nil) {
[decodedString appendString:buf];
}
buf = nil;
NSString *encodedRange;
if (![scanner scanUpToString:#"?=" intoString:&encodedRange]) {
break; // Invalid encoding
}
[scanner scanString:#"?=" intoString:NULL]; // Skip the terminating "?="
// Decode the encoded portion (naively using UTF-8 and assuming it really is Q encoded)
// I'm doing this really naively, but it should work
// Firstly I'm encoding % signs so I can cheat and turn this into a URL-encoded string, which NSString can decode
encodedRange = [encodedRange stringByReplacingOccurrencesOfString:#"%" withString:#"=25"];
// Turn this into a URL-encoded string
encodedRange = [encodedRange stringByReplacingOccurrencesOfString:#"=" withString:#"%"];
// Remove the underscores
encodedRange = [encodedRange stringByReplacingOccurrencesOfString:#"_" withString:#" "];
[decodedString appendString:[encodedRange stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding]];
}
NSLog(#"Decoded string = %#", decodedString);
[decodedString release];
[pool drain];
return 0;
}
This outputs:
chrisbook-pro:~ chris$ ./qp-decode
2010-12-01 18:54:42.903 qp-decode[9643:903] Decoded string = ësté é úm nômé açentúadõ.xlsx
Created an easier / successful method here using a trick involving NSString percent escapes..
https://stackoverflow.com/a/10888548/285694
I recently implemented a NSString category that decodes MIME Encoded-Word with either Q-encoding or B-encoding.
The code is available on GitHub and is briefly explained in this answer.

Open file and read from file Objective-c

I'm trying to open a file, and read from it.. but I'm having some issues.
FILE *libFile = fopen("/Users/pineapple/Desktop/finalproj/test242.txt","r");
char wah[200];
fgets(wah, 200, libFile);
printf("%s test\n", wah);
this prints: \377\376N test rather than any of the contents of my file.
any idea why?
complete code:
#import <Cocoa/Cocoa.h>
#import <stdio.h>
int main(int argc, char *argv[])
{
NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
FILE *libFile = fopen("/Users/pineapple/Desktop/finalproj/test242.txt","r");
if(libFile){
char wah[200];
fgets(wah, 200, libFile);
printf("%s test\n", wah);
}
[pool drain];
return 0;
}
And the test242.txt doesn't contain more than 200 chars.
If this is for Objective-C, why not do something like:
use NSFileHandle:
NSString * path = #"/Users/pineapple/Desktop/finalproj/test242.txt";
NSFileHandle * fileHandle = [NSFileHandle fileHandleForReadingAtPath:path];
NSData * buffer = nil;
while ((buffer = [fileHandle readDataOfLength:1024])) {
//do something with the buffer
}
or use NSString:
NSString * fileContents = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];
or if you need to read it line-by-line:
How to read data from NSFileHandle line by line?
IMO, there's no need to drop down to the C-level fileIO functions unless you have a very very very good reason for doing so (ie, open a file using O_SHLOCK or something)
Your file is stored in UTF-16 (Unicode). The first character in your file is "L", which is code point 0x4C. The first 4 bytes of your file are FF FE 4C 00, which are a byte-order mark (BOM) and the letter L encoded in UTF-16 as two bytes.
fgets is not Unicode-aware, so it's looking for the newline character '\n', which is the byte 0x0A. Most likely this will happen on the first byte of a Unicode newline (the two bytes 0A 00), but it could also happen on plenty of other non-newline characters such as U+010A (LATIN CAPITAL LETTER A WITH DOT ABOVE) or anything in the Gurmukhi or Gujarati scripts (U+0A00 to U+0AFF).
In any case, though, the data that's ending up in the buffer wah has lots of embedded nulls and looks something like FF FE 4C 00 47 00 4F 00 4F 00 0A 00. NUL (0x00) is the C string terminator, so when you attempt to print this out using printf, it stops at the first null, and all you see is \377\376L. \377\376 is the octal representation of the bytes FF FE.
The fix for this is to convert your text file to a single-byte encoding such as ISO 8859-1 or UTF-8. Note that must single-byte encodings (UTF-8 excepted) cannot encode the full range of Unicode characters, so if you need Unicode, I strongly recommend using UTF-8. Alternatively, you can convert your program to be Unicode-aware, but then you can no longer use a lot of standard library functions (such as fgets and printf), and you need to use wchar_t everywhere in place of char.
If you don't mind reading all of a file you can do something like this:
NSData* myData = [NSData dataWithContentsOfFile:myFileWithPath];
and then do whatever you'd like with the data from there. You will get nil if the file doesn't exist.
If you are assuming text (string) data in this file you can additionally do something like this and then parse it as a NSString:
NSString* myString = [[NSString alloc] initWithBytes:[myData bytes] length:[myData length] encoding:NSUTF8StringEncoding];
Since you mentioned you are relatively new to objective-c you can search NSStrings fairly well. Have a look here for more info on this.
I wanted this as well and thought "do this instead" did not answer the question, here is a working example below. Beware that fgets reads the \n delimiter and appends to your text.
NSString * fName = [[NSBundle mainBundle] pathForResource:#"Sample" ofType:#"txt"];
FILE *fileHandle = fopen([fName UTF8String],"r");
char space[1024];
while (fgets(space, 1024, fileHandle) != NULL)
{
NSLog(#"space = %s", space);
}
fclose(fileHandle);
printf("%s test\n");
You're not passing the string to printf. Try
printf("%s test\n", wah);
Also, if your file contains a line more than 200 characters long, fgets will read 200 characters into wah - then add a NUL to the end, which will be off the end of wah (since you declared it to be 200 characters) and will trample over something random, and the behaviour of your program will be undefined and may set fire to your cat.
Slycrel's got it. Expanding on that answer, here is another (in my opinion, simpler) way of turning that data into a string:
NSString *myFileString = [[NSString alloc] initWithData:someData encoding:NSUTF8StringEncoding];
This declares a new NSString directly using the NSData specified.