Writing in a file from multiple threads - objective-c

I'm writing a download manager in Objective-C which downloads file from multiple segments at the same times in order to improve the speed. Each segement of the file is downloaded in a thread.
At first, I thought to write each segment in a different file and to put together all the files at the end of the download. But for many reasons, it's not a good solution.
So, I'm searching a way to write in a file at a specific position and which is able to handle multiple thread because in my application, each segment is downloaded inside a thread.
In Java, I know that FileChannel does the trick perfectly but I have no idea in Objective-C.

The answers given thus far have some clear disadvantages:
File i/o using system calls definitely has some disadvantages regarding locking.
Caching parts in memory leads to serious issues in a memory constrained environment. (i.e. any computer)
A thread safe, efficient, lock free approach would be to use memory mapping, which works as follows:
create the result file of (at least) the total length needed
open() the file for read/write
mmap() it to some place in memory. The file now "lives" in memory.
write received parts in memory at the right offset in the file
keep track if all pieces have been received (e.g. by posting some selector on the main thread for every piece received and stored)
munmap() the memory and close() the file
The actual writing is handled by the kernel - your program will never issue a write system call of any form. Memory mapping generally has little downsides and is used extensively for things like shared libraries.
update: a piece of code says more than 1000 words... This is the mmap version of Mecki's lock-based multi-thread file writer. Note that writing is reduced to a simple memcpy, which cannot fail(!!), so there is no BOOL success to check. Performance is equivalent to the lock based version. (tested by writing 100 1mb blocks in parallel)
Regarding a comment on "overkill" of an mmap based approach: this uses less lines of code, doesn't require locking, is less likely to block on writing, requires no checking of return values on writing. The only "overkill" would be that it requires the developer to understand another concept than good old read/write file I/O.
The possibility to read directly into the mmapped memory region is left out, but is quite simple to implement. You can just read(fd,i_filedata+offset,length); or recv(socket,i_filedata+offset,length,flags); directly into the file.
#interface MultiThreadFileWriterMMap : NSObject
{
#private
FILE * i_outputFile;
NSUInteger i_length;
unsigned char *i_filedata;
}
- (id)initWithOutputPath:(NSString *)aFilePath length:(NSUInteger)length;
- (void)writeBytes:(const void *)bytes ofLength:(size_t)length
toFileOffset:(off_t)offset;
- (void)writeData:(NSData *)data toFileOffset:(off_t)offset;
- (void)close;
#end
#import "MultiThreadFileWriterMMap.h"
#import <sys/mman.h>
#import <sys/types.h>
#implementation MultiThreadFileWriterMMap
- (id)initWithOutputPath:(NSString *)aFilePath length:(NSUInteger)length
{
self = [super init];
if (self) {
i_outputFile = fopen([aFilePath UTF8String], "w+");
i_length = length;
if ( i_outputFile ) {
ftruncate(fileno(i_outputFile), i_length);
i_filedata = mmap(NULL,i_length,PROT_WRITE,MAP_SHARED,fileno(i_outputFile),0);
if ( i_filedata == MAP_FAILED ) perror("mmap");
}
if ( !i_outputFile || i_filedata==MAP_FAILED ) {
[self release];
self = nil;
}
}
return self;
}
- (void)dealloc
{
[self close];
[super dealloc];
}
- (void)writeBytes:(const void *)bytes ofLength:(size_t)length
toFileOffset:(off_t)offset
{
memcpy(i_filedata+offset,bytes,length);
}
- (void)writeData:(NSData *)data toFileOffset:(off_t)offset
{
memcpy(i_filedata+offset,[data bytes],[data length]);
}
- (void)close
{
munmap(i_filedata,i_length);
i_filedata = NULL;
fclose(i_outputFile);
i_outputFile = NULL;
}
#end

Queue up the segment-objects as they are received to a writer-thread. The writer-thread should keep a list of out-of-order objects so that the actual disk-writing is sequential. If a segment download fails, it can be pushed back onto the downloading thread pool for another try, (perhaps an internal retry-count should be kept). I suggest a pool of segment-objects to prevent one or more failed download of one segment resulting in runaway memory use as later segments are downloaded and added to the list.

Never forget, Obj-C bases on normal C and thus I would just write an own class, that handles file I/O using standard C API, which allows you to place the current write position anywhere within a new file, even far beyond the current file size (missing bytes are filled with zero bytes), as well as jumping forward and backward as you wish. The easiest way to achieve thread-safety is using a lock, this is not necessary the fastest way but in your specific case, I bet that the bottleneck is certainly not thread-synchronization. The class could have a header like this:
#interface MultiThreadFileWriter : NSObject
{
#private
FILE * i_outputFile;
NSLock * i_fileLock;
}
- (id)initWithOutputPath:(NSString *)aFilePath;
- (BOOL)writeBytes:(const void *)bytes ofLength:(size_t)length
toFileOffset:(off_t)offset;
- (BOOL)writeData:(NSData *)data toFileOffset:(off_t)offset;
- (void)close;
#end
And an implementation similar to this one:
#import "MultiThreadFileWriter.h"
#implementation MultiThreadFileWriter
- (id)initWithOutputPath:(NSString *)aFilePath
{
self = [super init];
if (self) {
i_fileLock = [[NSLock alloc] init];
i_outputFile = fopen([aFilePath UTF8String], "w");
if (!i_outputFile || !i_fileLock) {
[self release];
self = nil;
}
}
return self;
}
- (void)dealloc
{
[self close];
[i_fileLock release];
[super dealloc];
}
- (BOOL)writeBytes:(const void *)bytes ofLength:(size_t)length
toFileOffset:(off_t)offset
{
BOOL success;
[i_fileLock lock];
success = i_outputFile != NULL
&& fseeko(i_outputFile, offset, SEEK_SET) == 0
&& fwrite(bytes, length, 1, i_outputFile) == 1;
[i_fileLock unlock];
return success;
}
- (BOOL)writeData:(NSData *)data toFileOffset:(off_t)offset
{
return [self writeBytes:[data bytes] ofLength:[data length]
toFileOffset:offset
];
}
- (void)close
{
[i_fileLock lock];
if (i_outputFile) {
fclose(i_outputFile);
i_outputFile = NULL;
}
[i_fileLock unlock];
}
#end
The lock could be avoided in various way. Using Grand Central Dispatch and Blocks to schedule the seek + write operations on a Serial Queue would work. Another way would be to use UNIX (POSIX) file handlers instead of standard C ones (open() and int instead of FILE * and fopen()), duplicate the handler multiple times (dup() function) and then placing each of them to a different file offset, which avoids further seeking operations on each write and also locking, since POSIX I/O is thread-safe. However, both implementations would be somewhat more complicating, less portable and there would be no measurable speed improvement.

Related

monitoring ifstream read progress from separate thread in Obj-C

This is the code I'm using to write and read a file in the background using GCD.
#import "AppDelegate.h"
#import <dispatch/dispatch.h>
#import <iostream>
#import <fstream>
size_t fileSize = 1024 * 1024 * 10;
std::ofstream *osPtr = 0;
std::ifstream *isPtr = 0;
#implementation AppDelegate
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
// Insert code here to initialize your application
const float framerate = 40;
const float frequency = 1.0f/framerate;
[NSTimer scheduledTimerWithTimeInterval:frequency
target:self selector:#selector(doSomething)
userInfo:nil repeats:YES];
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
std::ofstream os("myFile", std::ios::binary);
if (os) {
osPtr = &os;
for (int i = 0; i<fileSize; i++) {
os << 'c';
}
osPtr = 0;
os.close();
printf("write done\n");
}
std::ifstream is("myFile", std::ios::binary);
if (is) {
is.seekg(0, std::ifstream::end);
fileSize = (size_t)is.tellg();
is.seekg(0, std::ifstream::beg);
isPtr = &is;
while ( is.good() )
{
char c;
is >> c;
}
isPtr = 0;
is.close();
printf("read done\n");
}
});
}
- (void)doSomething
{
// write file progress indicator
if (osPtr)
printf("%5.1f\n", (float)osPtr->tellp()/fileSize*100.0f);
// read file progress indicator
if (isPtr)
printf("%5.1f\n", (float)isPtr->tellg()/fileSize*100.0f);
}
#end
It writes ok, but when reading big files (5 mb or more) an EXEC_BAD_ACCESS error is thrown, within the streambuf class code.
template <class _CharT, class _Traits>
inline _LIBCPP_INLINE_VISIBILITY
typename basic_streambuf<_CharT, _Traits>::int_type
basic_streambuf<_CharT, _Traits>::sbumpc()
{
if (__ninp_ == __einp_)
return uflow();
return traits_type::to_int_type(*__ninp_++); //<---EXEC_BAD_ACCESS
}
This is the project test.zip
Does the documentation of std::of stream say that it is thread safe? I don't think so.
My bet would be that you always get a crash if your progress function is called while the osPtr or isPtr exists. But for small files, the writing/reading is so fast that they are both gone before your progress method is ever called.
The best way to read and write files asynchronously would be to use the GCD IO functions...
There is a convenience read function (and a similar write function).
void dispatch_read(
dispatch_fd_t fd,
size_t length,
dispatch_queue_t queue,
void (^handler)(dispatch_data_t data, int error));
Your handler block would be called back each time the system had some data ready to be read, and you could update your progress indicator.
You could use dispatch_after with the same queue, and they would be automatically seriallized (as long as you used a serial queue).
However, just to be clear: your problem is that you are accessing the stream objects from multiple threads at the same time. One thread is running the queue code block, and another is running your timer call. They are both trying to access the same stream objects. Bad news.
If you want to continue to use your method of IO, you need to serialize access in one of several ways. You can create a class that provides safe access to an IOStream across multiple threads, or you can serialize the access yourself with locks. Both C++ and Obj-C provide many synchronization APIs.
However, there is a very common idiom used in lots of apple code: the delegate.
In your case, a simple progress delegate, with a method that sends the current progress, would suffice. This way the delegate is called from within the context of the long running task, which means you have synchronized access to any shared data.
If you want, you can dispatch any GUI work to the main thread with GCD.
So this would be my approach.
I need to subclass std::streambuf, even when it can be overkill (like it is posted here or here), at least for a feature that should be quite common in multithreaded applications.

How to Change SQL Database Stored in a Singleton?

I have an app which pretty much follows the method described here. The key code is as follows:
#import <Foundation/Foundation.h>
#import <sqlite3.h>
#interface FailedBankDatabase : NSObject {
sqlite3 *_database;
}
+ (FailedBankDatabase*)database;
- (NSArray *)failedBankInfos;
#end
#import "FailedBankDatabase.h"
#import "FailedBankInfo.h"
#implementation FailedBankDatabase
static FailedBankDatabase *_database;
+ (FailedBankDatabase*)database {
if (_database == nil) {
_database = [[FailedBankDatabase alloc] init];
}
return _database;
}
- (id)init {
if ((self = [super init])) {
NSString *sqLiteDb = [[NSBundle mainBundle] pathForResource:#"banklist"
ofType:#"sqlite3"];
if (sqlite3_open([sqLiteDb UTF8String], &_database) != SQLITE_OK) {
NSLog(#"Failed to open database!");
}
}
return self;
}
- (void)dealloc {
sqlite3_close(_database);
[super dealloc];
}
Now, the app works with one database as expected. But, I want to be able to switch to a different database when the user touches a button. I have the button handler and logic OK, and I store the name of the database to be used and can retrieve it. But, no matter what I do, I always get the same (original) database being called. I fear that the handle associated with _database, a object of type sqlite3, in the example is not being changed properly, so I don't open the database properly. How should I go about changing this? You can't re-init a singleton, but I need to change what's stored in it, in this case _database. Thanks.
EDIT: I would add that if I ask for _database is a pointer. So I need to open a new database (and close the first I guess) and give the new database a new address in the process.
I had the same problem, but couldn't modify the database (they were used in other projects).
So, I created a method called useDatabase:, that close the previous connection, and open a new one.
The steps :
Your - (id)init remains the same
In FailedBankDatabase, you create a method that close and open the database with the name of the new database
-(void)useDatabase:(NSString*)database {
sqlite3_close(_database);
NSString *sqLiteDb = [[NSBundle mainBundle] pathForResource:database
ofType:#"sqlite3"];
if (sqlite3_open([sqLiteDb UTF8String], &_database) != SQLITE_OK) {
NSLog(#"Failed to open database!");
}
}
At the very beggining (for example in appDidFinishLaunching), you call the singleton once
[FailedBankDatabase database];
, so that it is first initialised.
Then, when you want to change the .sqlite used, you can call :
[FailedBankDatabase useDatabase:#"anOtherDatabase"]
I think you can do this when you don't have to change the database very often. In my case, I use this once at the very first screen, with 3 buttons, where I will choose wich database will be used.
For more complicated cases, for exemple involving multithreading, you should not do that since it closes the connection for a little time, while it is used elsewhere.
Hope it helps,
Jery
After some additional study, I was unsuccessful in answering the question as asked. However, it looks like FMDB can probably handle the task, I just didn't want to add a large framework to my project. I solved my problem an entirely different way: I modified each database to give it an identifying column and then combined them, and modified the query I used to select only the original database chunk that was wanted. This approach will only work when the databases have the same structure of course.

A mutex blocks only the main thread when it reaches its call with the #synchronized directive

I'm building a multithreaded application, from which more than one thread can write to an sqlite3 database including the main thread. I declared a static public variable to be used for the mutex:
#implementation Application
#pragma mark -
#pragma mark Static Initializer
static NSString * SubmitChangesLock = nil;
+ (void)initialize {
[super initialize];
SubmitChangesLock = [[NSString alloc] initWithString:#"Submit-Changes-Lock"];
}
+ (NSString *)submitChangesLock {
return SubmitChangesLock;
}
#end
and inside each method that should write to a database I'm using that variable with the #synchronized directive to lock the section that write to the database.
- (void)method1FromClass1 {
#synchronized ([Application submitChangesLock]) {
// write to the database here...
}
}
- (void)method2FromClass2 {
#synchronized ([Application submitChangesLock]) {
// write to the database here...
}
}
and everything worked fine but sometimes when any of those methods gets called from the main thread it freezes waiting for the mutex to be unlocked again but it didn't and the problem is that this only occurs on some calls from the main thread and the code that writes to the database is definitely finite so I could not determine why the main thread keeps waiting for mutex to be unlocked and why it's not getting unlocked on the first place.
Note: none of the other threads got blocked by this mutex, only the main on.
EDIT:
I tried to replace the #synchronized directive using the performSelectorOnMainThread:waitUntilDone:
- (void)writeToDatabase {
// write to the database here...
}
- (void)method2FromClass2 {
[self performSelectorOnMainThread:#selector(writeToDatabase) withObject:nil waitUntilDone:YES];
}
and it's working just fine but I'm trying to avoid so much load on the main thread not to block the user interaction.
Any help would be greatly appreciated, and many thanks in advance.
There are features in iOS which exist to help you avoid dealing with threads/locking in simple to moderately complex situations.
If you set up an NSOperationQueue and setMaxConcurrentOperationCount: to 1, as suggested by deanWombourne, you can offload all the work to a background thread. There's even a handy class (NSInvocationOperation) to easily reuse your code from existing classes in a queue.
If these methods that are running in the background will affect what's appearing in the UI, you can always use performSelectorOnMainThread:withObject:waitUntilDone: to update whatever is necessary.
If you do it like this, you'll never be blocking your main thread with DB activity. Since blocking the main thread freezes the UI, this is definitely the way to do things.

NSInvocation Leaks

I am trying to setup an NSInovcation system to launch selectors into background threads using performSelectorInBackground: - So far everything is successful when running the system on instance methods (-), but I also want to support class methods (+). I have adjusted my code to provide an invokeInBackgroundThread for both types of class and everything worked except for one problem. When the class methods are invoked I get my console flooded with "autoreleased with no pool in place" messages. No idea what is causing it. The code which is based off the DDFoundation open source project is shown below.
#implementation NSObject (DDExtensions)
...
+ (id)invokeInBackgroundThread
{
DDInvocationGrabber *grabber = [DDInvocationGrabber invocationGrabber];
[grabber setInvocationThreadType:INVOCATION_BACKGROUND_THREAD];
return [grabber prepareWithInvocationTarget:self];
}
- (id)invokeInBackgroundThread
{
DDInvocationGrabber *grabber = [DDInvocationGrabber invocationGrabber];
[grabber setInvocationThreadType:INVOCATION_BACKGROUND_THREAD];
return [grabber prepareWithInvocationTarget:self];
}
...
...
- (void)forwardInvocation:(NSInvocation *)ioInvocation
{
[ioInvocation setTarget:[self target]];
[self setInvocation:ioInvocation];
if (_waitUntilDone == NO) {
[_invocation retainArguments];
}
if (_threadType == INVOCATION_MAIN_THREAD)
{
[_invocation performSelectorOnMainThread:#selector(invoke)
withObject:nil
waitUntilDone:_waitUntilDone];
} else {
[_invocation performSelectorInBackground:#selector(invoke)
withObject:nil];
}
}
...
+(void)doSomething;
[[className invokeOnBackgroundThread] doSomething];
Main thread has autorelease pool by default, if you start extra thread - it's your job to create the pool. Actually, nothing complicated here, just
NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
// Work...
[pool release];
Also, if you have a lot of threads, I'd suggest you to take a look at NSOperation instead of running threads with [performSelectorInBackground]. NSOperation (with wrapping queue) is more flexible solution for such tasks.

Objective-C / Cocoa equivalent of C# ManualResetEvent

Is there an equivalent of the .NET ManualResetEvent class available for use in Objective-C / Cocoa?
I'm not very familiar with ManualResetEvent, but based on the documentation, it looks like the NSCondition class might be what you are looking for.
NSCondition is by no means an exact equivalent, but it does provide similar signaling functionality. You might also want to read up on NSLock.
Here is a wrapper class I created which emulates ManualResetEvent using NSCondition.
#interface WaitEvent : NSObject {
NSCondition *_condition;
bool _signaled;
}
- (id)initSignaled:(BOOL)signaled;
- (void)waitForSignal;
- (void)signal;
#end
#implementation WaitEvent
- (id)initSignaled:(BOOL)signaled
{
if (self = ([super init])) {
_condition = [[NSCondition alloc] init];
_signaled = signaled;
}
return self;
}
- (void)waitForSignal
{
[_condition lock];
while (!_signaled) {
[_condition wait];
}
[_condition unlock];
}
- (void)signal
{
[_condition lock];
_signaled = YES;
[_condition signal];
[_condition unlock];
}
#end
I've done just some basic testing but I think it should get the job done with much less ceremony.
I'll give you the sample code I would have liked to find yesterday (but couldn't find anywhere). If you want to create a producer/consumer class where the consumer is asynchronous, this is what you need to do :
You need to declare and allocate the NSConditionLock.
NSArray * data = [self getSomeData];
if ( [data count] == 0 ) {
NSLog(#"sendThread: Waiting...");
[_conditionLock lockWhenCondition:1];
[_conditionLock unlockWithCondition:0];
NSLog(#"sendThread: Back to life...");
}
else {
// Processing
}
And in the main code, when you add data and you want to unlock the other thread, you just have to add :
[_conditionLock lock];
[_conditionLock unlockWithCondition:1];
Note: I don't describe here how data are exchanged between the producer and the consumer. In my program it was going through an SQLite/CoreData database, so thread sync is done at a higher level. But if you use a NSMutableDictionary, you need to add some NSLock.
Ah, those are poor man's condition variables.
You could use the NSCondition class, but I think it's better
to go straight to the source. Start with pthread_cond_init.
You gonna love it.