Performance of sorting NSURLs with localizedStandardCompare - objective-c

I need to sort a NSMutableArray containing NSURLs with localizedStandardCompare:
[array sortUsingComparator:^NSComparisonResult(id obj1, id obj2) {
NSString *f1 = [(NSURL *)obj1 absoluteString];
NSString *f2 = [(NSURL *)obj2 absoluteString];
return [f1 localizedStandardCompare:f2];
}];
This works fine, but I worry a bit about the performance: the block will be evaluated n log n times during the sort, so I'd like it to be fast (the array might have up to 100,000 elements). Since localizedStandardCompare is only available on NSString, I need to convert the URLs to strings. Above, I use absoluteString, but there are other methods that return a NSString, for example relativeString. Reading the NSURL class reference, I get the impression that relativeString might be faster, since the URL does not need to be resolved, but this is my first time with Cocoa and OS-X, and thus just a wild guess.
Additional constraint: in this case, all URLs come from a NSDirectoryEnumerator on local storage, so all are file URLs. It would be a bonus if the method would work for all kinds of URL, though.
My question: which method should I use to convert NSURL to NSString for best performance?
Profiling all possible methods might be possible, but I have only one (rather fast) OS-X machine, and who knows - one day the code might end up on iOS.
I'm using Xcode 4.5.2 on OS-X 10.8.2, but the program should work on older version, too (within reasonable bounds).

You may need to use Carbon's FSCatalogSearch, which is faster than NSDirectoryEnumerator. As for getting the path, I see no choice.
The only thing you may consider for speeding up the sorting is that the paths are partially sorted, because the file system will return all the files of the same folder in alphabetical order.
So you may want to take all the path of the same directory and merge them with the other results.
For example the home contents may be:
ab1.txt
bb.txt
c.txt
The documents directory may contain:
adf.txt
fgh.txt
So you just merge them with a customized algorithm, which just applies the merge part of a mergesort.

I benchmarked the sort. It turned out that absoluteString and relativeString are much faster that path or relativePath.
Sorting about 26000 entries:
relativeString 550ms
absoluteString 580ms
path 920ms
relativePath 960ms
field access 480ms
For field access, I put the value of absoluteString into a field prior to the sort and access that. So, the ...String accessors are almost as fast as field access, and thus a good choice for my use case.

Related

coding efficiency vs execution efficiency

So I have these two methods:
-(void)importEvents:(NSArray*)allEvents {
NSMutableDictionary *subjectAssociation = [[NSMutableDictionary alloc] init];
for (id thisEvent in allEvents) {
if (classHour.SubjectShort && classHour.Subject) {
[subjectAssociation setObject: classHour.Subject forKey:classHour.SubjectShort];
}
}
[self storeSubjects:subjectAssociation];
}
-(void)storeSubjects:(NSMutableDictionary*)subjects {
NSArray *documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDir = [documentPaths objectAtIndex:0];
NSString *subjectsList = [documentsDir stringByAppendingPathComponent:#"Subjects.plist"];
[subjects writeToFile:subjectsList atomically:YES];
}
The first loops through an array of let's say 100 items, and builds a NSMutableDictionary of about 10 unique key/value pairs.
The second method writes this dictionary to a file for reference elsewhere in my app.
The first method is called quite often, and so is the second. However, I know, that once the dictionary is built and saved, its contents won't ever change, no matter how often I call these methods, since the number of possible values is just limited.
Question: given the fact that the second method essentially needs to be executed only once, should I add some lines that check if the file already exists, essentially adding code that needs to be executed, or can I just leave it as is, overwriting an existing file over and over again?
Should I care? I should add that I don't seem to suffer from any performance issues, so this is more of a philosophical/hygienic question.
thanks
It depends.
You say
once the dictionary is built and saved, its contents won't ever change
until they do :-)
If your app is not suffering from any performance issues on this particular loop I wouldn't try to cache for the reason that unless you somehow remember that you have a once-only write on the file you are storing up a bug for later.
This could be mitigated by using an intention revealing name on the method. i.e
-(void)storeSubjectsOnceOnlyPerLaunch:(NSDictionary*)subjects
If I got my time back for tracing down bugs caused by caching, I would have several days back in my life.
Your solution is totally over engineered, and has tons of potential to go wrong. What if the users drive is full? Does this file get backed up? Does it need backing up / are you wasting the users time backing it up? Can this fail? Are you handling it? You are concentrating on the entering and storing of data, you should be focusing on accessing that data.
I'd have a readwrite property allEvents and a property eventAssociations, declared readonly in the interface, but readwrite in the implementation file.
The allEvents setter stores allEvents and sets _eventAssociations to nil.
The eventAssociations getter checks whether _eventAssociations is nil and recalculates it when needed. A simple and bullet-proof pattern.

Bundle declarations into one statement or not?

Given the following Objective-C example, is it simply a matter of style and ease of reading to keep separate statements or to bundle them into one? Are there any actual benefits of either? Is it a waste of memory to declare individual variables?
NSDictionary *theDict = [anObject methodToCreateDictionary];
NSArray *theValues = [theDict allValues];
NSString *theResult = [theArray componentsJoinedByString:#" "];
or
NSString *theResult = [[[anObject methodToCreateDictionary] theValues] componentsJoinedByString:#" "];
I take the following into consideration when I declare a separate variable:
If I might want to see its value in the debugger.
If I am accessing the variable more than once.
If the line is too long.
There is no practical difference between the two approaches, however.
Also, you haven't asked directly about this, but be aware, when you access objects using dot notation, for example:
myObject.myObjectProperty1.myObjectProperty1Property;
If you are going to access myObjectProperty1Property more than once, it can be advisable to assign it to a local named variable. If you don't, the look-up will be executed more than once.
Now I can't emphasise enough, for many if not most situations this time saving is so infinitesimal as to seriously call into question whether it is worth even spending the time doing extra typing for the assignation! So why am I raising this? Because having said that - stylistic "anality" apart (I just made up a new word) - if the section of code you are writing is running in a tight loop, it can be worth taking the extra care. An example would be when writing the code which populates the cells in a UICollectionView that contains a large number of cells. Additionally, if you are using Core Data and you are using the dot notation to refer to the properties of NSManagedObject properties, then there is far greater overhead with each and every look-up, in which case it is much more surely worth taking the time to assign any values referred to by "nested" dot notation calls to a local variable first.

What is the most efficient way to compare an NSString in this way

I have an app (Cocoa Touch, Web Browser), however I need to be able to compare an NSString with thousands of other strings. Here's the deal.
When a WebView loads, I get the URL. I need to compare this URL with literally thousands of results (27,847). Each of those numbers represents a line of text in a plain text file.
I would like to know the best way to go about getting the data from the text file, and comparing it with the NSString. I need to know if the URL that the WebView is loading contains any of these strings.
The app needs to be very fast, so I can't just parse through every line in the text file, turn it into an array, and then compare each and every result.
Please share your ideas. Thanks.
I think the cleanest solution is to:
Create a web service that can offload the work to a server and return a response. Since it sounds like you're building a web protection service, your database may grow to be quite substantial over time, and you can just scale your server up to increase its speed. Furthermore, you don't want to have to update your app every time the lookup data changes.
Other options are:
Use a local SQLite database. SQL databases should perform lookups relatively fast.
If you don't want to use any database, have you tried putting all the search strings into an NSDictionary or NSMutableDictionary object? This way, you would just check if the valueForKey: for the string you're searching for is nil.
Sample code for this:
NSDictionary *searchDictionary = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithBool:YES], #"google.com",
[NSNumber numberWithBool:YES], #"yahoo.com",
[NSNumber numberWithBool:YES], #"bing.com",
nil];
NSString *searchString = #"bing.com";
if ([searchDictionary valueForKey:searchString]) {
// search string found
} else {
// search string not found
}
Note: if you want the NSDictionary to perform case-insensitive comparisons, pre-load all values lowercase, and make the search string lowercase when using valueForKey:.
How much memory this could take is a whole other story, but I don't see how this comparison could be made much faster locally. I strongly recommend the remove web service approach, though.
Create a string from the file and enumerate through the lines.
NSString *stringToCheck;
NSData *bytesOfFile = [NSData dataWithContentsOfFile:#"/path/myfile.txt"];
NSString *fileString = [[NSString alloc] initWithData:bytesOfFile
encoding:NSUTF8Encoding];
__block BOOL foundMatch = NO;
[fileString enumerateLinesUsingBlock:^(NSString *line, BOOL *stop){
if([stringToCheck isEqualToString:line]){
*stop = YES;
foundMatch = YES;
}
}];
This is a job for regular expressions. Take all of the substrings you're looking for/filtering against, escape them appropriately (escaping characters such as [, ], |, and \, among others, with \), and join them with a |. The resulting string is your regular expression, which you apply to each URL.
You could loop through an entire array full of substrings, doing rangeOfString:options: with each one, but that's the slow way. A good regular expression implementation is built for this sort of thing, and I would hope that Apple's implementation is suitable.
That said, profile the hell out of it. I've seen some regex implementations choke on the | operator, so you'll want to make sure that Apple's is not one of them.
If you need to compare each string in your text file, you are going to have to compare it, no way around it.
What you can do however is do it on a background thread while showing some loading or something, and it won't feel as if the app got stuck.
I would suggest you try with NSDictionary first. You can load up all your URLs into this, and internally it will use some sort of hash table/map for very quick (O(1)) lookup.
You can then check the result of [dictionary objectForKey:userURL], and if it returns something then the URL matched one in the dictionary.
The only problem with this is that it requires an exact string match. If your dictionary contains http://server/foobar and the user enters http://server/FOOBAR (because it's a case-insensitive server), you are going to get a miss on your lookup. Similarly, adding ?foobar queries to the end of URLs will result in a miss. You could also add an explicit port with server:80, and with %XX character encoding you can create hundreds of variations of the same URL. You will have to account for this and canonicalize both the URLs in your dictionary, and the URL entered by the user prior to lookup.

Should I use an intermediate temp variable when appending to an NSString?

This works -- it does compile -- but I just wanted to check if it would be considered good practice or something to be avoided?
NSString *fileName = #"image";
fileName = [fileName stringByAppendingString:#".png"];
NSLog(#"TEST : %#", fileName);
OUTPUT: TEST : image.png
Might be better written with a temporary variable:
NSString *fileName = #"image";
NSString *tempName;
tempName = [fileName stringByAppendingString:#".png"];
NSLog(#"TEST : %#", tempName);
just curious.
Internally, compilers will normally break your code up into a representation called "Single Static Assignment" where a given variable is only ever assigned one value and all statements are as simple as possible (compound elements are separated out into different lines). Your second example follows this approach.
Programmers do sometimes write like this. It is considered the clearest way of writing code since you can write all statements as basic tuples: A = B operator C. But it is normally considered too verbose for code that is "obvious", so it is an uncommon style (outside of situations where you're trying to make very cryptic code comprehensible).
Generally speaking, programmers will not be confused by your first example and it is considered acceptable where you don't need the original fileName again. However, many Obj-C programmers, encourage the following style:
NSString *fileName = [#"image" stringByAppendingString:#".png"];
NSLog(#"TEST : %#", fileName);
or even (depending on horizontal space on the line):
NSLog(#"TEST : %#", [#"image" stringByAppendingString:#".png"]);
i.e. if you only use a variable once, don't name it (just use it in place).
On a stylistic note though, if you were following the Single Static Assigment approach, you shouldn't use tempName as your variable name since it doesn't explain the role of the variable -- you'd instead use something like fileNameWithExtension. In a broader sense, I normally avoid using "temp" as a prefix since it is too easy to start naming everything "temp" (all local variables are temporary so it has little meaning).
The first line is declaring an NSString literal. It has storage that lasts the lifetime of the process, so doesn't need to be released.
The call to stringByAppendingString returns an autoreleased NSString. That should not be released either, but will last until it gets to the next autorelease pool drain.
So assigning the result of the the stringByAppendingString call back to the fileName pointer is perfectly fine in this case. In general, however, you should check what your object lifetimes are, and handle them accordingly (e.g. if fileName had been declared as a string that you own the memory to you would need to release it, so using a temp going to be necessary).
The other thing to check is if you're doing anything with fileName after this snippet - e.g. holding on to it in a instance variable - in which case your will need to retain it.
The difference is merely whether you still need the reference to the literal string or not. From the memory management POV and the object creational POV it really shouldn't matter. One thing to keep in mind though is that the second example makes it slightly easier when debugging. My preferred version would look like this:
NSString *fileName = #"image";
NSString *tempName = [fileName stringByAppendingString:#".png"];
NSLog(#"TEST : %#", tempName);
But in the end this is just a matter of preference.
I think you're right this is really down to preferred style.
Personally I like your first example, the codes not complicated and the first version is concise and easier on the eyes. Theres too much of the 'language' hiding what it's doing in the second example.
As noted memory management doesn't seem to be an issue in the examples.

Why are there more files/hardlinks with the same iNode than the reference count shows?

I have recursed a folder on a single volume, and retrieved a list of filenames, reference-counts and inode numbers, using
NSFileManager attributesOfItemAtPath
and NSDictionary fileSystemFileNumber and objectForKey:NSFileReferenceCount
For some reason I am getting results such as a reference count of 10, but a list of many more than 10 files with the same iNode number.
Of note is that I am not including SymLinks in my list, I'm only recording a file when [dict fileType] == NSFileTypeRegular
Any ideas why this might be the case?
Edit: #Peter Hosey, I'm writing the iNode and reference count as follows:
CLMFileManagedObj *clmf;
clmf = (CLMFileManagedObj *)[NSEntityDescription insertNewObjectForEntityForName:#"CLMFile" inManagedObjectContext:moc];
NSUInteger fsfn = [dict fileSystemFileNumber];
[clmf setValue:[NSNumber numberWithUnsignedInteger:fsfn] forKey:#"iNodeNumber"];
[clmf setValue:(NSNumber*)[dict objectForKey:NSFileReferenceCount] forKey:#"referenceCount"];
Note that the reason iNodeNumber and referenceCount are being written slightly differently is that [dict] offers a direct (NSUInteger)fileSystemFileNumber get-method, whereas the fileReferenceCount needs to be retrieved using keys (according to any help I could find on NSDictionary)
Both properties of the CLMFile entity are Int 64. From what I can tell, NSUInteger's type is dependent on whether running 32 or 64 bit mode, but [NSNumber numberWithUnsignedInteger] accepts NSUInteger as the argument, so I'd assume it deals with the number correctly in either mode.
I can't see where in Activity Monitor it says whether it's 32/64 bit. I'd assume whatever the default for XCode 3.1.3 projects are.
It's possible I'm missing something here, as I'm relatively new to Mac/Obj-C/XCode/Cocoa, so any help/pointers would be appreciated. Experienced programmer, but not in this environment (though learning as fast as I can....)
Are you looking at Time Machine backups? Are there directory hardlinks involved?
If directory A contains directories B1 and B2 which are hardlinked, a file with the same inode would be inside both B1 and B2, yet the ref count could be one.