How to read text chunks from a huge text file? - objective-c

I am trying to read a text file containing characters in billions. Using the function
contentOfFile is not working, as my application get crashed due to it.
So anybody please send me the sample code so that I get the chunks according to my requirement.Whichever i need i wanna get that one only.
please reply as soon as possible.

I'm guessing this is an iOS app. In that case, you are likely hitting the memory limit by calling contentsOfFile: because that method is trying to read the entire contents of the file into a variable (memory). Remember that on iOS your app must play nice and if it decides to consume too much memory, then the watchdog process will kill your app to save the device from rebooting (which happens because there is no disk to swap to on iOS devices).
Have you had a look at NSFileHandle? NSFileHandle supports seeking within a text a file. With some simple iteration you can use the following to methods to seek within the file and read chunks of data:
- (NSData *)readDataOfLength:(NSUInteger)length;
- (void)seekToFileOffset:(unsigned long long)offset;
It might look something like this. Assume pathToFile is an NSString containing the path to the text file to be read in.
uint64 offset = 0;
uint32 chunkSize = 1024; //Read 1KB chunks.
NSFileHandle *handle = [NSFileHandle fileHandleForReadingAtPath:pathToFile];
NSData *data = [handle readDataOfLength:chunkSize];
while ([data length] > 0)
{
//Make sure for the next line you choose the appropriate string encoding.
NSString *dataString = [[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding] autorelease];
/* PERFORM STRING PROCESSING HERE */
/* END STRING PROCESSING */
offset += [data length];
[handle seekToFileOffset:offset];
data = [handle readDataOfLength:chunkSize];
}
[handle closeFile];

A good idea is to look at the textedit source because I've opened massive files with it before and there should be a way to do it. Not sure why your app is crashing though. It shouldn't have a problem.

Related

How to reliably retrieve NSData objects from NSInputStream in XCode

So my application works along these lines:
An iPod continuously sends NSDictionaries that contain: an image encoded in JPEG and some image properties as NSStrings.
The NSDictionary is encoded using NSPropertyListSerialization with the format BinaryFormat_v1_0 and sent in packets of 1024 bytes via NSStream to the central computer running an app on OSX.
The OSX app receives the data packets, continuously appending to a single NSMutableData object, until it sees the first packet of the next NSData object (which in binary format I've found starts as 'bplist').
The NSData is converted back to an NSDictionary to be used by the OSX app, by calling NSPropertyListSerialization.
Once the NSData was successfully converted (or not),the NSData object is set back to zero to start reading the next round of packets.
A few more notes: both the NSInputStream and NSOutput streams are running on their respective device's currentRunLoop in NSDefaultRunLoopMode.
When running this process, sometimes the conversion back to NSDictionary works fine with no errors (about 1/3 of the attempts), but the other times the conversion returns this error:
Error: Failed to convert NSData to NSDict : Error Domain=NSCocoaErrorDomain Code=3840 "Unexpected character b at line 1" UserInfo={NSDebugDescription=Unexpected character b at line 1, kCFPropertyListOldStyleParsingError=Error Domain=NSCocoaErrorDomain Code=3840 "Conversion of string failed." UserInfo={NSDebugDescription=Conversion of string failed.}}
Following are the parts of the program that parse the data from the stream:
... method to handle stream events:
-(void)stream:(NSStream *)aStream handleEvent:(NSStreamEvent)eventCode {
switch(eventCode) {
case NSStreamEventHasBytesAvailable: {
uint8_t buf[1024];
unsigned int len = (unsigned)[(NSInputStream *)aStream read:buf maxLength:1024];
if(len) {
[self handleEventBuffer:buf WithLength:len];
}
...
... and the method that takes care of the data:
-(void)handleEventBuffer:(uint8_t*)buf WithLength:(unsigned int)len {
...
NSString *bufStr = [NSString stringWithFormat:#"%s",(const char*)buf];
if ([bufStr containsString:#"bplist00"] && [self.cameraData length] > 0) {
// Detected new file, enter in all the old data and reset for new data
NSError *error;
NSDictionary *tempDict = [[NSDictionary alloc] init];
tempDict = [NSPropertyListSerialization propertyListWithData:self.cameraData
options:0
format:NULL
error:&error];
if (error != nil) {
// Expected good file but no good file, erase and restart
NSLog(#"Error: Failed to convert NSData to NSDict : %#", [error description]);
[self.cameraData setLength:0];
}
...
[self.cameraData setLength:0];
[self.cameraData appendBytes:buf length:len];
} else {
// Still recieving data
[self.cameraData appendBytes:buf length:len];
}
So, the question that I'm getting at is:
How can I fix my parsing method to give me reliable results that don't randomly fail to convert?
OR is there a better way than this to parse buffer streams for this purpose?
OR am I just doing something stupid or missing something obvious?
You appear to be relying on each write to the stream resulting in a matching read of the same size, do you know this is guaranteed by NSStream? If not then any read could contain parts of two (or more) of your encoded dictionaries, and you would get the parsing errors you see.
Alternative approach:
For each encoded dictionary to send:
Write end:
Send a message containing the size in bytes of the encoded dictionary that will follow.
Write the encoded dictionary in chunks, the last chunk may be short
Repeat
Read end:
Read the size message specifying its exact length in bytes.
Read the encoded dictionary in chunks, making sure you read only the number of bytes reported by (1).
Repeat.
Provided you are using a reliable communication stream this should enable you to read each encoded dictionary reliably. It avoids you trying to figure out where the boundary between each encoded dictionary is, as that information is part of your protocol.
HTH

Save NSData when memory mapped

What I am hoping to do may or may not be possible but I'll give it a shot. I am attempting to load huge multiple gigabyte text files. I am currently using an memory mapped NSData and only loading portions at a time and just infinite scrolling through it and load the currently visible chunk and it all is happy and it barely breaks a sweat with memory (<30mb).
Next I wanted to add editing of the data. I casually assumed I could just edit that memory mapped data and have it reflect on disk. This does not seem to be the case. Is there a way I can say hey the data changed! update the file. I am a little vague on how the file and NSData are linked.
Here is how it is currently working.
// create the memory mapped data:
self.fileData = [NSMutableData dataWithContentsOfURL:url options:NSDataReadingMappedAlways error:outError];
//load a chunk
self.currentRange = NSMakeRange(0, 4096);
void* data = malloc(4096);
[self.fileData getBytes:data range:self.currentRange];
NSString* currentView = [[NSString alloc] initWithBytes:data length:4096 encoding:NSUTF8StringEncoding];
//replace a chunk:
NSData* currentData = [[self.textArea string] dataUsingEncoding:NSUTF8StringEncoding];
[self.fileData replaceBytesInRange:self.currentRange withBytes:[currentData bytes] length:[currentData length]];
//If I close and re-open it doesn't actually work
Is there a way of doing this or would it be possible to create an NSOutputStream to the URL and somehow stream portions of it back to disk? Is there another way to write/save memory mapped data? If I try to do a writeToURL: or comparable function it will load it all to actual memory which is really no good.

NSTask: why program is blocking when read from NSPipe?

I use the NSTask to run shell command and output the data via NSPipe. At first, I using bellow method to read output data, it is no any problem.
- (void)outputAvailable:(NSNotification *)aNotification {
NSString *newOutput;
NSMutableData *allData = [[NSMutableData alloc] init];
NSData *taskData = nil;
if((taskData = [readHandle availableData]) && [taskData length])
newOutput = [[NSString alloc] initWithData:allData encoding:NSASCIIStringEncoding];
NSLog(#"%#", newOutput);
[readHandle readInBackgroundAndNotify];
}
The problem about the method is that it only output 4096 bytes data. So I using while loop to get more data, modify the method like this:
- (void)outputAvailable:(NSNotification *)aNotification {
NSString *newOutput;
NSMutableData *allData; //Added.
NSData *taskData = nil;
while ((taskData = [readHandle availableData]) && [taskData length]) {
[allData appendData:taskData];
}
newOutput = [[NSString alloc] initWithData:allData encoding:NSASCIIStringEncoding];
NSLog(#"%#", newOutput);
[readHandle readInBackgroundAndNotify];
}
Then problem occurs: the program is blocking in the while loop and can not perform the following statements. I ensure that allData is what I wanted, but after appending the last data chunk, it is blocking.
Could you give me some solutions? Thanks.
Your while() loop effectively blocks further notifications, causing the whole program to block waiting for something to flush the buffer.
You should readInBackgroundAndNotify, then pull off availableBytes on each notification, appending it to your NSMutableData (which is likely held in an instance variable). When you handle the notification, don't attempt to wait for more data or do any kind of a while loop. The system will notify you when more data is available.
I.e. the system pushes data to you, you do not pull data from the system.
Ahh... OK. You should still only pull data when there is data available. Your while() loop is doing that. Not enough coffee. My bad.
The final block is most likely because your external process is not closing the pipe; no EOF is received and, thus, the program is waiting forever for more data that never arrives.
Either:
make sure the background task exits
detect when you've received enough data and terminate the process
If you are doing some kind of conversion program (say, tr) where you write data on the processes standard input, then you might need to close the standard input pipe.

iTunes File Sharing app: realtime monitoring for incoming datas

I'm working on iOS project that supports iTunes file sharing feature. The goal is realtime tracking incoming/changed data's.
I'm using (kinda modified) DirectoryWatcher class from Apple's sample code
and also tried this source code.
The data is NSBundle (*.bundle) and some bundles are in 100-500 MB ranges, depends on its content, some video/audio stuff. The bundles has xml based descriptor file in it.
The problem is any of these codes above fires notification or whatever else when the data just started copying and but not when the copy/change/remove process finished completely.
Tried next:
checking file attributes:
NSDictionary *fileAttrs = [[NSFileManager defaultManager] attributesOfItemAtPath:[contURL path] error:nil];
BOOL fileBusy = [[fileAttrs objectForKey:NSFileBusy] boolValue];
looking for the fileSize changes:
dispatch_async(_checkQueue, ^{
for (NSURL *contURL in tempBundleURLs) {
NSInteger lastSize = 0;
NSDictionary *fileAttrs = [[NSFileManager defaultManager] attributesOfItemAtPath:[contURL path] error:nil];
NSInteger fileSize = [[fileAttrs objectForKey:NSFileSize] intValue];
do {
lastSize = fileSize;
[NSThread sleepForTimeInterval:1];
fileAttrs = [[NSFileManager defaultManager] attributesOfItemAtPath:[contURL path] error:nil];
fileSize = [[fileAttrs objectForKey:NSFileSize] intValue];
NSLog(#"doing job");
} while (lastSize != fileSize);
NSLog(#"next job");
}
);
any other solutions?
The solution above works great for bin files, but not for .bundle (as .bundle files are directory actually). In order to make it work with .bundle, you should iterate each file inside .bundle
You can use GCD's dispatch sources mechanism - using it you can observe particular system events (in your case, this is vnode type events, since you're working with file system).
To setup observer for particular directory, i used code like this:
- (dispatch_source_t) fileSystemDispatchSourceAtPath:(NSString*) path
{
int fileDescr = open([path fileSystemRepresentation], O_EVTONLY);// observe file system events for particular path - you can pass here Documents directory path
//observer queue is my private dispatch_queue_t object
dispatch_source_t source = dispatch_source_create(DISPATCH_SOURCE_TYPE_VNODE, fileDescr, DISPATCH_VNODE_ATTRIB| DISPATCH_VNODE_WRITE|DISPATCH_VNODE_LINK|DISPATCH_VNODE_EXTEND, observerQueue);// create dispatch_source object to observe vnode events
dispatch_source_set_registration_handler(source, ^{
NSLog(#"registered for observation");
//event handler is called each time file system event of selected type (DISPATCH_VNODE_*) has occurred
dispatch_source_set_event_handler(source, ^{
dispatch_source_vnode_flags_t flags = dispatch_source_get_data(source);//obtain flags
NSLog(#"%lu",flags);
if(flags & DISPATCH_VNODE_WRITE)//flag is set to DISPATCH_VNODE_WRITE every time data is appended to file
{
NSLog(#"DISPATCH_VNODE_WRITE");
NSDictionary* dict = [[NSFileManager defaultManager] attributesOfItemAtPath:path error:nil];
float size = [[dict valueForKey:NSFileSize] floatValue];
NSLog(#"%f",size);
}
if(flags & DISPATCH_VNODE_ATTRIB)//this flag is passed when file is completely written.
{
NSLog(#"DISPATCH_VNODE_ATTRIB");
dispatch_source_cancel(source);
}
if(flags & DISPATCH_VNODE_LINK)
{
NSLog(#"DISPATCH_VNODE_LINK");
}
if(flags & DISPATCH_VNODE_EXTEND)
{
NSLog(#"DISPATCH_VNODE_EXTEND");
}
NSLog(#"file = %#",path);
NSLog(#"\n\n");
});
dispatch_source_set_cancel_handler(source, ^{
close(fileDescr);
});
});
//we have to resume dispatch_objects
dispatch_resume(source);
return source;
}
I found two rather reliable (i.e. not 100% reliable but reliable enough for my needs) approaches, which only work in conjunction with polling the contents of the directory:
Check NSURLContentModificationDateKey. While the file is being transferred, this value is set to the current date. After transfer has finished, it is set to the value the original file had: BOOL busy = (-1.0 * [modDate timeintervalSinceNow]) < pollInterval;
Check NSURLThumbnailDictionaryKey. While the file is being transferred, this value is nil, afterwards it cointains a thumbnail, but probably only for file types from which the system can produce a thumbnail. Not a problem for me cause I only care about images and videos, but maybe for you. While this is more reliable than solution 1, it hammers the CPU quite a bit and may even cause your app to get killed if you have a lot of files in the import directory.
Dispatch sources and polling can be combined, i.e. when a dispatch source detects a change, start polling until no busy files are left.

What is the typical approach taken for writing NSString objects to a file?

As an example, I want to write out some string value, str, to a file, "yy", as it changes through each iteration of a loop. Below is how I'm currently implementing it:
NSOutputStream *oStream = [[NSOutputStream alloc] initToFileAtPath:#"yy" append:NO];
[oStream open];
while ( ... )
{
NSString *str = /* has already been set up */
NSData *strData = [str dataUsingEncoding:NSUTF8StringEncoding];
[oStream write:r(uint8_t *)[strData bytes] maxLength:[strData length]];
...
}
[oStream close];
Ignoring the UTF-8 encoding, which I do require, is this how NSStrings are typically written to a file? That is, by first converting to an NSData object, then using bytes and casting the result to uint8_t for NSOutputStream's write?
Are there alternatives that are used more often?
EDIT: Multiple NSStrings need to be appended to the same file, hence the loop above.
ctshryock's solution is a bit different than yours, since you're appending as you go, and ctshyrock's write a single file all at once (overwriting the previous version). For a long-running process, these are going to be radically different.
If you do want to write as you go, I typically use NSFileHandle here rather than NSOutputStream. It's just a little bit easier to use in my opinion.
NSFileHandle *fileHandle = [NSFileHandle fileHandleForWritingAtPath:aPath];
while ( ... )
{
NSString *str = /* has already been set up */
[fileHandle writeData:[str dataUsingEncoding:NSUTF8StringEncoding]];
}
[fileHandle closeFile];
Note that fileHandle will automatically close when it is dealloced, but I like to go ahead and close it explicitly when I'm done with it.
I typically use NSString's methods writeToFile:atomically:encoding:error: or writeToURL:atomically:encoding:error: for writing to file.
See the String Programming Guide for Cocoa for more info