I have a huge word list of over 280.000+ words that is loaded from an sqlite database to an NSArray. then I do a fast enumeration to check if a certain string value entered by the user matches one of the words in the Array. Since the array is so large it takes about 1-2 seconds on the iphone 4 to go through that array.
How can I improve the performance? Maybe I should make several smaller arrays? one for each letter in the alphabet so that there is less data to go through.
this is how my database class looks
static WordDatabase *_database;
+(WordDatabase *) database
{
if (_database == nil) {
_database = [[WordDatabase alloc] init];
}
return _database;
}
- (id) init
{
if ((self = [super init])) {
NSString *sqLiteDb = [[NSBundle mainBundle] pathForResource:#"dictionary" ofType:#"sqlite"];
if (sqlite3_open([sqLiteDb UTF8String], &_database) != SQLITE_OK) {
NSLog(#"Failed to open database!");
}
}
return self;
}
- (NSArray *)dictionaryWords {
NSMutableArray *retval = [[[NSMutableArray alloc] init] autorelease];
NSString *query = #"SELECT word FROM words";
sqlite3_stmt *statement;
if (sqlite3_prepare_v2(_database, [query UTF8String], -1, &statement, nil) == SQLITE_OK) {
while (sqlite3_step(statement) == SQLITE_ROW) {
char *wordChars = (char *) sqlite3_column_text(statement, 0);
NSString *name = [[NSString alloc] initWithUTF8String:wordChars];
name = [name uppercaseString];
[retval addObject:name];
}
sqlite3_finalize(statement);
}
return retval;
}
then in my main view I initialise it like this
dictionary = [[NSArray alloc] initWithArray:[WordDatabase database].dictionaryWords];
and finally I go through the array using this method
- (void) checkWord
{
NSString *userWord = formedWord.wordLabel.string;
NSLog(#"checking dictionary for %#", userWord);
for (NSString *word in dictionary) {
if ([userWord isEqualToString: word]) {
NSLog(#"match found");
}
}
}
Lots of different ways.
stick all the words in a dictionary or set, testing for presence is fast
break it up as you suggest; create a tree type structure of some kind.
use the database to do the search. They are generally pretty good at exactly that, if constructed correctly.
If space isn't an issue, store a hash value of each word and use that for your base lookup. Once filtered by the hash, then compare each of the words. This will reduce the number of costly string comparisons. Easier to index/sort and performs quick lookups.
I second a dictionary. NSDictionary for objective c.
for instance:
// To print out all key-value pairs in the NSDictionary myDict
for(id key in myDict)
NSLog(#"key=%# value=%#", key, [myDict objectForKey:key]);
Related
In Objective C, I have an object e.g. Person with a lot of fields firstName, lastName, phoneNumber, address, city... and so on. These fields types are NSString and any of these could be nil.
Now I want to concatenate my field values in another NSString :
Person *p = ...
NSMutableString *s = [[NSMutableString alloc] init];
for (NSString *field in #[p.firstName, p.lastName, p.phoneNumber,
p.adress, p.city, ....more fields...]) {
if ([field length] > 0) {
[s appendFormat:#"%#\n", field];
}
}
Issue is that this code crash whenever one of the field is nil. I have the exception :
[__NSPlaceholderArray initWithObjects:count:]: attempt to insert nil object
from objects[0]'
How could I handle simply the case of nil values within my for loop ?
I agree with #TomPace's post, for this small number I would do a simple if/else.
However, there may be times you do need to loop through a list of fields.
It's a bad idea to blindly pull the values into an array as you could be trying inserting nil values into the array. In this case, it would be better to place the field names into a key array as strings and loop through the list using valueForKey: to access the values. I would possibly store the keys list somewhere else where it can be used again.
Person *p = ...
NSMutableString *s = [[NSMutableString alloc] init];
NSArray *keys = #[#"firstName", #"lastName", #"phoneNumber", #"adress", #"city"];
for (NSString *key in keys)
{
NSString *value = [p valueForKey:key];
if ([value length] > 0) {
[s appendFormat:#"%#\n", value];
}
}
Person *person = [[Person alloc] init];
person.firstName = nil;
person.lastName = #"lastName";
NSMutableString *s = [[NSMutableString alloc] init];
[s appendFormat:#"%#\n", person.firstName == nil?#"":person.firstName];
[s appendFormat:#"%#\n", person.lastName == nil?#"":person.lastName];
For a selection of fields this small, don't use a for loop.
You may be saving a bit of code by attempting the for-loop structure, but it's really not the way to go if you're building the NSArray with only a few fields, and especially because you can't put nil items in it.
A better way to go is:
Person *p = ...
NSMutableString *s = [[NSMutableString alloc] init];
if ([p.firstName length] > 0) [s appendFormat:#"%#\n", p.firstName];
if ([p.lastName length] > 0) [s appendFormat:#"%#\n", p.lastName];
if ([p.phoneNumber length] > 0) [s appendFormat:#"%#\n", p.phoneNumber];
if ([p.adress length] > 0) [s appendFormat:#"%#\n", p.adress];
if ([p.city length] > 0) [s appendFormat:#"%#\n", p.city];
Edit, after original Question was updated with large amount of fields.
Like #BergQuester said, an approach to support a larger, arbitrary set of fields is using KVO-style inspection.
NSArray *fieldNames = #[#"firstName", #"lastName", #"phoneNumber", ....more fields...];
NSString *field;
for (NSString *fieldName in fieldNames) {
field = [p valueForKey:fieldName];
if ([field length] > 0 ) {
[s appendFormat: #"%#\n", field];
}
}
Try to create NSMutableString category
#import "NSMutableString+checkForNilObject.h"
#implementation NSMutableString (checkForNilObject)
-(void) appendNotNillObject:(NSString *) string
{
if(string)
{
[self appendString:string];
}
}
#end
You can override the getters of the class Person.
#implementation Person
- (NSString *)firstName{
if (_firseName == nil)
_firstName = #"";
return _firstName;
}
///....Other setters
#end
Like this you can define all your setters here.
This code works fine, but each time i look at it, i die a little bit inside. :(
Can you help me streamline it a bit?
Is there a more elegant way to .append(SomeString)? (To give you some perspective, code prints elements of the Linked List)
- (NSString *) description {
Node* tempNode = [self firstNode];
if (tempNode == nil) {
return #"List contains no elements";
}
NSString *desc= [[NSString alloc] initWithString:#"(null) -- "];
desc = [desc stringByAppendingString:[firstNode nodeCharacter]];
desc = [desc stringByAppendingString:#" -- "];
while ([tempNode nextNode] != nil) {
desc = [desc stringByAppendingString:[[tempNode nextNode]nodeCharacter]];
desc = [desc stringByAppendingString:#" -- "];
tempNode = [tempNode nextNode];
}
return [desc stringByAppendingString:#" (null)"];
}
First of all, if you want to build a string step by step or modify it, stop using stringByAppendingString:, and use NSMutableString instead of NSString !!!
Then, for your matter, you can even use stringWithFormat: to build part of your string.
Finally, you forgot to manage your memory: you alloc/init your string but never release it (and as your reassign the desc variable the line after, you loose track of the allocated memory and have a leak.
So here is the revised code:
- (NSString *) description {
Node* tempNode = [self firstNode];
if (tempNode == nil) {
return #"List contains no elements";
}
NSMutableString *desc = [NSMutableString stringWithFormat:#"(null) -- %# -- ",[firstNode nodeCharacter]];
while ((tempNode = [tempNode nextNode]) != nil) {
[desc appendFormat:#"%# -- ",[tempNode nodeCharacter]];
}
[desc appendingString:#" (null)"];
return desc;
}
(Note that you may also build an NSArray of the nodes of your list and use componentsJoinedByString then at the end… so you have multiple possibilities here anyway)
Yes use an NSMutableString which will involve much less memory allocations.
- (NSString *) description {
Node* tempNode = [self firstNode];
if (tempNode == nil) {
return #"List contains no elements";
}
//Autorelease string to prevent memory leak
NSMutableString *desc= [NSMutableString stringWithString:#"(null) -- "];
[desc appendString:[firstNode nodeCharacter]];
[desc appendString:#" -- "];
while ([tempNode nextNode] != nil) {
[desc appendString:[[tempNode nextNode]nodeCharacter]];
[desc appendString:#" -- "];
tempNode = [tempNode nextNode];
}
[desc appendString:#" (null)"];
return desc;
}
Rather than using the while loop, take a look at Cocoa's Fast Enumeration. It is supported by NSArrays already, and allow you to rapidly enumerate through array elements:
for (id object in myArray)
// Do something with object
You can adopt fast enumeration (or use NSEnumerator) in your node elements, then loop through them:
// Use mutable strings
NSMutableString *desc = [[NSMutableString alloc] initWithString:#"(null) -- "];
for (Node *node in nodeList) {
[desc appendString:[node nodeCharacter];
[desc appendString:#" -- "];
}
return [desc stringByAppendingString:#" (null)"];
Have you looked at NSMutableString? It has -appendString: methods.
Edit: You could also use a recursive function on your node to traverse the list and build up the string. I would make some simple public method like - (NSString *)description to call the first method and then use a private method internally to do your dirty work, like so:
- (NSString *)recursiveDescriptionWithSubnode:(Node *)node {
if(!node) {
return [self nodeCharacter];
}
else {
return [[self nodeCharacter] stringByAppendingString:[self recursiveDescriptionWithSubnode:[self nextNode]];
}
}
Note that this isn't tail recursive and so for a long list this would build up a sizable autorelease pool and call stack. Making it tail recursive is left as an exercise for the reader (but you could use NSMutableString to do it).
Instead of building up a string, you could create an array of strings, that you ultimately return as a single, joined string separating each value with your " -- " string.
If you know how many elements you might have, you could create the array with that capacity, which might be slightly more efficient under the hood for NSMutableArray.
I've created a custom sorting by creating a new category for the NSString class. Below is my code.
#implementation NSString (Support)
- (NSComparisonResult)sortByPoint:(NSString *)otherString {
int first = [self calculateWordValue:self];
int second = [self calculateWordValue:otherString];
if (first > second) {
return NSOrderedAscending;
}
else if (first < second) {
return NSOrderedDescending;
}
return NSOrderedSame;
}
- (int)calculateWordValue:(NSString *)word {
int totalValue = 0;
NSString *pointPath = [[NSBundle mainBundle] pathForResource:#"pointvalues"ofType:#"plist"];
NSDictionary *pointDictionary = [[NSDictionary alloc] initWithContentsOfFile:pointPath];
for (int index = 0; index < [word length]; index++) {
char currentChar = [word characterAtIndex:index];
NSString *individual = [[NSString alloc] initWithFormat:#"%c",currentChar];
individual = [individual uppercaseString];
NSArray *numbersForKey = [pointDictionary objectForKey:individual];
NSNumber *num = [numbersForKey objectAtIndex:0];
totalValue += [num intValue];
// cleanup
individual = nil;
numbersForKey = nil;
num = nil;
}
return totalValue;
}
#end
My question is whether I create a point dictionary to determine the point value associated with each character in the alphabet based on a plist. Then in my view controller, I call
NSArray *sorted = [words sortedArrayUsingSelector:#selector(sortByPoint:)];
to sort my table of words by their point values. However, creating a new dictionary each time the -sortByPoint: method is called is extremely inefficient. Is there a way to create the pointDictionary beforehand and use it for each subsequent call in the -calculateWordValue:?
This is a job for the static keyword. If you do this:
static NSDictionary *pointDictionary = nil
if (pointDictionary==nil) {
NSString *pointPath = [[NSBundle mainBundle] pathForResource:#"pointvalues" ofType:#"plist"];
pointDictionary = [[NSDictionary alloc] initWithContentsOfFile:pointPath];
}
pointDictionary will be persistent for the lifetime of your app.
One other optimization is to build a cache of scores by using this against each of your words:
[dict setObject:[NSNumber numberWithInt:[word calculateWordValue:word]] forKey:word];
Then use the keysSortedByValueUsingSelector: method to extract your list of words (note the selector chould be compare:, since the objects being compared are the NSNumbers).
Finally, the word argument on your method is redundant. Use self instead:
-(int)calculateWordValue {
...
for (int index = 0; index < [self length]; index++)
{
char currentChar = [self characterAtIndex:index];
...
}
...
}
Change your sortByPoint:(NSString *) otherString method to take the dictionary as a parameter, and pass it your pre-created dictionary.
sortByPoint:(NSString *)otherString withDictionary:(NSDictionary *)pointDictionary
EDIT: Won't work because of usage in sortedArrayWithSelector. Apologies. Instead, you may be better off creating a wrapper class for your point dictionary as a singleton which you then obtain a reference to each time your sort function runs.
In calculateWordValue:
NSDictionary *pointDictionary = [[DictWrapper sharedInstance] dictionary];
DictWrapper has an NSDictionary as a property, and a class method sharedInstance (to return the singleton. You have to set that dictionary and pre-initialize it before you do you first sorting.
How can I optimise out this nested for loop?
The program should go through each word in the array created from the word text file, and if it's greater than 8 characters, add it to the goodWords array. But the caveat is that I only want the root word to be in the goodWords array, for example:
If greet is added to the array, I don't want greets or greetings or greeters, etc.
NSString *string = [NSString stringWithContentsOfFile:#"/Users/james/dev/WordParser/word.txt" encoding:NSUTF8StringEncoding error:NULL];
NSArray *words = [string componentsSeparatedByString:#"\r\n"];
NSMutableArray *goodWords = [NSMutableArray array];
BOOL shouldAddToGoodWords = YES;
for (NSString *word in words)
{
NSLog(#"Word: %#", word);
if ([word length] > 8)
{
NSLog(#"Word is greater than 8");
for (NSString *existingWord in [goodWords reverseObjectEnumerator])
{
NSLog(#"Existing Word: %#", existingWord);
if ([word rangeOfString:existingWord].location != NSNotFound)
{
NSLog(#"Not adding...");
shouldAddToGoodWords = NO;
break;
}
}
if (shouldAddToGoodWords)
{
NSLog(#"Adding word: %#", word);
[goodWords addObject:word];
}
}
shouldAddToGoodWords = YES;
}
How about something like this?
//load the words from wherever
NSString * allWords = [NSString stringWithContentsOfFile:#"/usr/share/dict/words"];
//create a mutable array of the words
NSMutableArray * words = [[allWords componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]] mutableCopy];
//remove any words that are shorter than 8 characters
[words filterUsingPredicate:[NSPredicate predicateWithFormat:#"length >= 8"]];
//sort the words in ascending order
[words sortUsingSelector:#selector(caseInsensitiveCompare:)];
//create a set of indexes (these will be the non-root words)
NSMutableIndexSet * badIndexes = [NSMutableIndexSet indexSet];
//remember our current root word
NSString * currentRoot = nil;
NSUInteger count = [words count];
//loop through the words
for (NSUInteger i = 0; i < count; ++i) {
NSString * word = [words objectAtIndex:i];
if (currentRoot == nil) {
//base case
currentRoot = word;
} else if ([word hasPrefix:currentRoot]) {
//word is a non-root word. remember this index to remove it later
[badIndexes addIndex:i];
} else {
//no match. this word is our new root
currentRoot = word;
}
}
//remove the non-root words
[words removeObjectsAtIndexes:badIndexes];
NSLog(#"%#", words);
[words release];
This runs very very quickly on my machine (2.8GHz MBP).
A Trie seems suitable for your purpose. It is like a hash, and is useful for detecting if a given string is a prefix of an already seen string.
I used an NSSet to ensure that you only have 1 copy of a word added at a time. It will add a word if the NSSet does not already contain it. It then checks to see if the new word is a substring for any word that has already been added, if true then it won't add the new word. It's case-insensitive as well.
What I've written is a refactoring of your code. It's probably not that much faster but you really do want a tree data structure if you want to make it a lot faster when you want to search for words that have already been added to your tree.
Take a look at RedBlack Trees or B-Trees.
Words.txt
objective
objectively
cappucin
cappucino
cappucine
programme
programmer
programmatic
programmatically
Source Code
- (void)addRootWords {
NSString *textFile = [[NSBundle mainBundle] pathForResource:#"words" ofType:#"txt"];
NSString *string = [NSString stringWithContentsOfFile:textFile encoding:NSUTF8StringEncoding error:NULL];
NSArray *wordFile = [string componentsSeparatedByString:#"\n"];
NSMutableSet *goodWords = [[NSMutableSet alloc] init];
for (NSString *newWord in wordFile)
{
NSLog(#"Word: %#", newWord);
if ([newWord length] > 8)
{
NSLog(#"Word '%#' contains 8 or more characters", newWord);
BOOL shouldAddWord = NO;
if ( [goodWords containsObject:newWord] == NO) {
shouldAddWord = YES;
}
for (NSString *existingWord in goodWords)
{
NSRange textRange = [[newWord lowercaseString] rangeOfString:[existingWord lowercaseString]];
if( textRange.location != NSNotFound ) {
// newWord contains the a substring of existingWord
shouldAddWord = NO;
break;
}
NSLog(#"(word:%#) does not contain (substring:%#)", newWord, existingWord);
shouldAddWord = YES;
}
if (shouldAddWord) {
NSLog(#"Adding word: %#", newWord);
[goodWords addObject:newWord];
}
}
}
NSLog(#"***Added words***");
int count = 1;
for (NSString *word in goodWords) {
NSLog(#"%d: %#", count, word);
count++;
}
[goodWords release];
}
Output:
***Added words***
1: cappucino
2: programme
3: objective
4: programmatic
5: cappucine
I am trying to generate an NSDictonary that can be used to populate a listview with data I retrieved from an SQL statement. when I go to create an array and add them it adds the arrays for ALL my keys and not just for the current key. I've tried a removeAllObjects on the array but for some reason that destroys ALL my data that I already put in the dictionary.
//open the database
if(sqlite3_open([dbPath UTF8String], &database) == SQLITE_OK)
{
const char *sql = "select alphaID, word from words order by word";
sqlite3_stmt *selectStatement;
//prepare the select statement
int returnValue = sqlite3_prepare_v2(database, sql, -1, &selectStatement, NULL);
if(returnValue == SQLITE_OK)
{
NSMutableArray *NameArray = [[NSMutableArray alloc] init];
NSString *alphaTemp = [[NSString alloc] init];
//loop all the rows returned by the query.
while(sqlite3_step(selectStatement) == SQLITE_ROW)
{
NSString *currentAlpha = [NSString stringWithUTF8String:(char *)sqlite3_column_text(selectStatement, 1)];
NSString *definitionName = [NSString stringWithUTF8String:(char *)sqlite3_column_text(selectStatement, 2)];
if (alphaTemp == nil){
alphaTemp = currentAlpha;
}
if ([alphaTemp isEqualToString:(NSString *)currentAlpha]) {
[NameArray addObject:definitionName];
}
else if (alphaTemp != (NSString *)currentAlpha) {
[self.words setObject:NameArray forKey:currentAlpha];
[NameArray removeAllObjects];
[NameArray addObject:definitionName];
}
}
}
The Statement above adds all the "keys" but then removes all the array elements for all keys. if I take out the removeAllKeys it adds ALL the array elements for ALL keys. I don't want this I want it to add the array elements FOR the specific key then move on to the next key.
in the end I want a NSDictonary with
A (array)
Alpha (string)
Apple (string)
B (array)
Beta (string)
Ball (string)
C (array)
Code (string)
...
Though I don't think it affects your problem, from the way I read your code, you should change
NSString *alphaTemp = [[NSString alloc] init];
to
NSString *alphaTemp = nil;
since alphaTemp is just used to point to an NSString that is generated initially as currentAlpha. You also should call [NameArray release] at some point below the code you've given, since you alloc'd it.
The real issue is that you are repeatedly adding pointers to the same NSMutableArray to your NSDictionary (self.words). I can see two ways to fix this:
Change
[self.words setObject:NameArray forKey:currentAlpha];
to
[self.words setObject:[NSArray arrayWithArray:NameArray] forKey:currentAlpha];
so that you are adding a newly-created (non-mutable) NSArray to your NSDictionary.
-- or --
Insert
[NameArray release];
NameArray = [[NSMutableArray alloc] init];
after
[self.words setObject:NameArray forKey:currentAlpha];
so that once you've inserted the NSMutableArray into the NSDictionary, you create a new NSMutableArray for the next pass.