How can I optimise out this nested for loop? - objective-c

How can I optimise out this nested for loop?
The program should go through each word in the array created from the word text file, and if it's greater than 8 characters, add it to the goodWords array. But the caveat is that I only want the root word to be in the goodWords array, for example:
If greet is added to the array, I don't want greets or greetings or greeters, etc.
NSString *string = [NSString stringWithContentsOfFile:#"/Users/james/dev/WordParser/word.txt" encoding:NSUTF8StringEncoding error:NULL];
NSArray *words = [string componentsSeparatedByString:#"\r\n"];
NSMutableArray *goodWords = [NSMutableArray array];
BOOL shouldAddToGoodWords = YES;
for (NSString *word in words)
{
NSLog(#"Word: %#", word);
if ([word length] > 8)
{
NSLog(#"Word is greater than 8");
for (NSString *existingWord in [goodWords reverseObjectEnumerator])
{
NSLog(#"Existing Word: %#", existingWord);
if ([word rangeOfString:existingWord].location != NSNotFound)
{
NSLog(#"Not adding...");
shouldAddToGoodWords = NO;
break;
}
}
if (shouldAddToGoodWords)
{
NSLog(#"Adding word: %#", word);
[goodWords addObject:word];
}
}
shouldAddToGoodWords = YES;
}

How about something like this?
//load the words from wherever
NSString * allWords = [NSString stringWithContentsOfFile:#"/usr/share/dict/words"];
//create a mutable array of the words
NSMutableArray * words = [[allWords componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]] mutableCopy];
//remove any words that are shorter than 8 characters
[words filterUsingPredicate:[NSPredicate predicateWithFormat:#"length >= 8"]];
//sort the words in ascending order
[words sortUsingSelector:#selector(caseInsensitiveCompare:)];
//create a set of indexes (these will be the non-root words)
NSMutableIndexSet * badIndexes = [NSMutableIndexSet indexSet];
//remember our current root word
NSString * currentRoot = nil;
NSUInteger count = [words count];
//loop through the words
for (NSUInteger i = 0; i < count; ++i) {
NSString * word = [words objectAtIndex:i];
if (currentRoot == nil) {
//base case
currentRoot = word;
} else if ([word hasPrefix:currentRoot]) {
//word is a non-root word. remember this index to remove it later
[badIndexes addIndex:i];
} else {
//no match. this word is our new root
currentRoot = word;
}
}
//remove the non-root words
[words removeObjectsAtIndexes:badIndexes];
NSLog(#"%#", words);
[words release];
This runs very very quickly on my machine (2.8GHz MBP).

A Trie seems suitable for your purpose. It is like a hash, and is useful for detecting if a given string is a prefix of an already seen string.

I used an NSSet to ensure that you only have 1 copy of a word added at a time. It will add a word if the NSSet does not already contain it. It then checks to see if the new word is a substring for any word that has already been added, if true then it won't add the new word. It's case-insensitive as well.
What I've written is a refactoring of your code. It's probably not that much faster but you really do want a tree data structure if you want to make it a lot faster when you want to search for words that have already been added to your tree.
Take a look at RedBlack Trees or B-Trees.
Words.txt
objective
objectively
cappucin
cappucino
cappucine
programme
programmer
programmatic
programmatically
Source Code
- (void)addRootWords {
NSString *textFile = [[NSBundle mainBundle] pathForResource:#"words" ofType:#"txt"];
NSString *string = [NSString stringWithContentsOfFile:textFile encoding:NSUTF8StringEncoding error:NULL];
NSArray *wordFile = [string componentsSeparatedByString:#"\n"];
NSMutableSet *goodWords = [[NSMutableSet alloc] init];
for (NSString *newWord in wordFile)
{
NSLog(#"Word: %#", newWord);
if ([newWord length] > 8)
{
NSLog(#"Word '%#' contains 8 or more characters", newWord);
BOOL shouldAddWord = NO;
if ( [goodWords containsObject:newWord] == NO) {
shouldAddWord = YES;
}
for (NSString *existingWord in goodWords)
{
NSRange textRange = [[newWord lowercaseString] rangeOfString:[existingWord lowercaseString]];
if( textRange.location != NSNotFound ) {
// newWord contains the a substring of existingWord
shouldAddWord = NO;
break;
}
NSLog(#"(word:%#) does not contain (substring:%#)", newWord, existingWord);
shouldAddWord = YES;
}
if (shouldAddWord) {
NSLog(#"Adding word: %#", newWord);
[goodWords addObject:newWord];
}
}
}
NSLog(#"***Added words***");
int count = 1;
for (NSString *word in goodWords) {
NSLog(#"%d: %#", count, word);
count++;
}
[goodWords release];
}
Output:
***Added words***
1: cappucino
2: programme
3: objective
4: programmatic
5: cappucine

Related

Objective-C: How to find the most common string in an array?

I have an array of strings from an online database that I trying to determine the most commonly used word. The values inside the arrays will vary but I want to check the most common words of whatever collection or words I'm using. If theoretically I had an array of the following...
NSArray *stringArray = [NSArray arrayWithObjects:#"Duck", #"Duck", #"Duck", #"Duck", #"Goose"];
How do I iterate through this array to determine the most common string, which would obviously be "Duck"?
Simplest way is probably NSCountedSet:
NSCountedSet* stringSet = [[NSCountedSet alloc] initWithArray:strings];
NSString* mostCommon = nil;
NSUInteger highestCount = 0;
for(NSString* string in stringSet) {
NSUInteger count = [stringSet countForObject:string];
if(count > highestCount) {
highestCount = count;
mostCommon = string;
}
}
You can use the word as a key into a dictionary.
NSMutableDictionary *words = [NSMutableDictionary dictionary];
for (NSString *word in stringArray) {
if (!words[word]) {
[words setValue:[NSDecimalNumber zero] forKey:word];
}
words[word] = [words[word] decimalNumberByAdding:[NSDecimalNumber one]];
}
Now iterate through words and find the key with the highest value.
NSString *mostCommon;
NSDecimalNumber *curMax = [NSDecimalNumber zero];
for (NSString *key in [words allKeys]) {
if ([words[key] compare:curMax] == NSOrderedDescending) {
mostCommon = key;
curMax = word[key];
}
}
NSLog(#"Most Common Word: %#", mostCommon);
EDIT: Rather than looping through the array once then looping separately through the sorted dictionary, I think we can do better and do it all in a single loop.
NSString *mostCommon;
NSDecimalNumber *curMax = [NSDecimalNumber zero];
NSMutableDictionary *words = [NSMutableDictionary dictionary];
for (NSString *word in stringArray) {
if (!words[word]) {
[words setValue:[NSDecimalNumber zero] forKey:word];
}
words[word] = [words[word] decimalNumberByAdding:[NSDecimalNumber one]];
if ([words[word] compare:curMax] == NSOrderedDescending) {
mostCommon = word;
curMax = words[word];
}
}
NSLog(#"Most Common Word: %#", mostCommon);
This should be significantly faster than my answer pre-edit, though I don't know how it compares to using the NSCountedSet answer.
Try using NSPredicate.
NSUInteger count=0;
NSString *mostCommonStr;
for(NSString *strValue in stringArray) {
NSUInteger countStr=[[stringArray filteredArrayUsingPredicate:[NSPredicate predicateWithFormat:#"self MATCHES[CD] %#, strValue]]count];
if(countStr > count) {
count=countStr;
mostCommonStr=strValue;
}
}
NSLog(#"The most commonstr is %#",mostCommonStr);

Algorithm to find anagrams Objective-C

I've got an algorithm to find anagrams within a group of eight-letter words. Effectively it's alphabetizing the letters in the longer word, doing the same with the shorter words one by one, and seeing if they exist in the longer word, like so:
tower = eortw
two = otw
rot = ort
The issue here is that if I look for ort in eortw (or rot in tower), it'll find it, no problem. Rot is found inside tower. However, otw is not inside eortw (or two in tower), because of the R in the middle. Ergo, it doesn't think two is found in tower.
Is there a better way I can do this? I'm trying to do it in Objective-C, and both the eight-letter words and regular words are stored in NSDictionaries (with their normal and alphabetized forms).
I've looked at various other posts re. anagrams on StackOverflow, but none seem to address this particular issue.
Here's what I have so far:
- (BOOL) doesEightLetterWord: (NSString* )haystack containWord: (NSString *)needle {
for (int i = 0; i < [needle length] + 1; i++) {
if (!needle) {
NSLog(#"DONE!");
}
NSString *currentCharacter = [needle substringWithRange:NSMakeRange(i, 1)];
NSCharacterSet *set = [NSCharacterSet characterSetWithCharactersInString: currentCharacter];
NSLog(#"Current character is %#", currentCharacter);
if ([haystack rangeOfCharacterFromSet:set].location == NSNotFound) {
NSLog(#"The letter %# isn't found in the word %#", currentCharacter, haystack);
return FALSE;
} else {
NSLog(#"The letter %# is found in the word %#", currentCharacter, haystack);
int currentLocation = [haystack rangeOfCharacterFromSet: set].location;
currentLocation++;
NSString *newHaystack = [haystack substringFromIndex: currentLocation];
NSString *newNeedle = [needle substringFromIndex: i + 1];
NSLog(#"newHaystack is %#", newHaystack);
NSLog(#"newNeedle is %#", newNeedle);
}
}
}
If you use only part of the letters it isn't a true anagram.
A good algorithm in your case would be to take the sorted strings and compare them letter by letter, skipping mis-matches in the longer word. If you reach the end of the shorter word then you have a match:
char *p1 = shorter_word;
char *p2 = longer_word;
int match = TRUE;
for (;*p1; p1++) {
while (*p2 && (*p2 != *p1)) {
p2++;
}
if (!*p2) {
/* Letters of shorter word are not contained in longer word */
match = FALSE;
}
}
This is one that approach I might take for finding out if one ordered word contained all of the letters of another ordered word. Note that it won't find true anagrams (That simply requires the two ordered strings to be the same) but this does what I think you're asking for:
+(BOOL) does: (NSString* )longWord contain: (NSString *)shortWord {
NSString *haystack = [longWord copy];
NSString *needle = [shortWord copy];
while([haystack length] > 0 && [needle length] > 0) {
NSCharacterSet *set = [NSCharacterSet characterSetWithCharactersInString: [needle substringToIndex:1]];
if ([haystack rangeOfCharacterFromSet:set].location == NSNotFound) {
return NO;
}
haystack = [haystack substringFromIndex: [haystack rangeOfCharacterFromSet: set].location+1];
needle = [needle substringFromIndex: 1];
}
return YES;
}
The simplest (but not most efficient) way might be to use NSCountedSet. We can do this because for counted sets, [a isSubsetOfSet:b] return YES if and only if [a countForObject:object] <= [b countForObject:object] for every object in a.
Let's add a category to NSString to do it:
#interface NSString (lukech_superset)
- (BOOL)lukech_isSupersetOfString:(NSString *)needle;
#end
#implementation NSString (lukech_superset)
- (NSCountedSet *)lukech_countedSetOfCharacters {
NSCountedSet *set = [NSCountedSet set];
[self enumerateSubstringsInRange:NSMakeRange(0, self.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
[set addObject:substring];
}];
return set;
}
- (BOOL)lukech_isSupersetOfString:(NSString *)needle {
return [[needle lukech_countedSetOfCharacters] isSubsetOfSet:[self lukech_countedSetOfCharacters]];
}
#end

Check if NSString contains all or some characters

I have an NSString called query which contains ~10 characters.
I would like to check to see if a second NSString called word contains all of the characters in query, or some characters, but no other characters which aren't specified in query.
Also, if there is only one occurrence of the character in the query, there can only be one occurrence of the character in the word.
Please could you tell me how to do this?
NSString *query = #"ABCDEFJAKSUSHFKLAFIE";
NSString *word = #"fearing"; //would pass as NO as there is no 'n' in the query var.
The following answers the first half:
NSCharacterSet *nonQueryChars = [[NSCharacterSet characterSetWithCharactersInString:[query lowercaseString]] invertedSet];
NSRange badCharRange = [[word lowercaseString] rangeOfCharacterFromSet:nonQueryChars];
if (badCharRange.location == NSNotFound) {
// word only has characters in query
} else {
// found unwanted characters in word
}
I need to think about the second half of the requirement.
Ok, the following code should fulfill both requirements:
- (NSCountedSet *)wordLetters:(NSString *)text {
NSCountedSet *res = [NSCountedSet set];
for (NSUInteger i = 0; i < text.length; i++) {
[res addObject:[text substringWithRange:NSMakeRange(i, 1)]];
}
return res;
}
- (void)checkWordAgainstQuery {
NSString *query = #"ABCDEFJAKSUSHFKLAFIE";
NSString *word = #"fearing";
NSCountedSet *queryLetters = [self wordLetters:[query lowercaseString]];
NSCountedSet *wordLetters = [self wordLetters:[word lowercaseString]];
BOOL ok = YES;
for (NSString *wordLetter in wordLetters) {
int wordCount = [wordLetters countForObject:wordLetter];
// queryCount will be 0 if this word letter isn't in query
int queryCount = [queryLetters countForObject:wordLetter];
if (wordCount > queryCount) {
ok = NO;
break;
}
}
if (ok) {
// word matches against query
} else {
// word has extra letter or too many of a matching letter
}
}

Cutting the length of an NSString without splitting the last word

I'm trying to cut the length of an NSString without splitting the last word with this method:
// cut a string by words
- (NSString* )stringCutByWords:(NSString *)string toLength:(int)length;
{
// search backwards in the string for the beginning of the last word
while ([string characterAtIndex:length] != ' ' && length > 0) {
length--;
}
// if the last word was the first word of the string search for the end of the word
if (length <= 0){
while ([string characterAtIndex:length] != ' ' && length > string.length-1) {
length++;
}
}
// define the range you're interested in
NSRange stringRange = {0, length};
// adjust the range to include dependent chars
stringRange = [string rangeOfComposedCharacterSequencesForRange:stringRange];
// Now you can create the short string
string = [string substringWithRange:stringRange];
return [NSString stringWithFormat:#"%#...",string];
}
now my question is:
Is there a build-in way in objective-c or cocoa-touch which i did not see or else is there a "nicer" way to do this because iam not very happy with this solution.
greetings and thanks for help
C4rmel
My proposal for a Category method
#interface NSString (Cut)
-(NSString *)stringByCuttingExceptLastWordWithLength:(NSUInteger)length;
#end
#implementation NSString (Cut)
-(NSString *)stringByCuttingExceptLastWordWithLength:(NSUInteger)length
{
__block NSMutableString *newString = [NSMutableString string];
NSArray *components = [self componentsSeparatedByString:#" "];
if ([components count] > 0) {
NSString *lastWord = [components objectAtIndex:[components count]-1];
[components enumerateObjectsUsingBlock:^(NSString *obj, NSUInteger idx, BOOL *stop) {
if (([obj length]+[newString length] + [lastWord length] + 2) < length) {
[newString appendFormat:#" %#", obj];
} else {
[newString appendString:#"…"];
[newString appendFormat:#" %#", lastWord];
*stop = YES;
}
}];
}
return newString;
}
Usage:
NSString *string = #"Hello World! I am standing over here! Can you see me?";
NSLog(#"%#", [string stringByCuttingExceptLastWordWithLength:25]);
Suggestions:
make it a category method;
use NSCharacterSet and the built-in search methods rather than rolling your own.
So:
/* somewhere public */
#interface NSString (CutByWords)
- (NSString *)stringCutByWordsToMaxLength:(int)length
#end
/* in an implementation file, somewhere */
#implementation NSString (CutByWords)
// cut a string by words
- (NSString *)stringCutByWordsToMaxLength:(int)length
{
NSCharacterSet *whitespaceCharacterSet =
[NSCharacterSet whitespaceCharacterSet];
// to consider: a range check on length here?
NSRange relevantRange = NSMakeRange(0, length);
// find beginning of last word
NSRange lastWordRange =
[self rangeOfCharacterFromSet:whitespaceCharacterSet
options:NSBackwardsSearch
range:relevantRange];
// if the last word was the first word of the string,
// consume the whole string; this looks to be the same
// effect as the original scan forward given that the
// assumption is already made in the scan backwards that
// the string doesn't end on a whitespace; if I'm wrong
// then get [whitespaceCharacterSet invertedSet] and do
// a search forwards
if(lastWordRange.location == NSNotFound)
{
lastWordRange = relevantRange;
}
// adjust the range to include dependent chars
stringRange = [self rangeOfComposedCharacterSequencesForRange:stringRange];
// Now you can create the short string
NSString *string = [self substringWithRange:stringRange];
return [NSString stringWithFormat:#"%#...",string];
}
#end
/* subsequently */
NSString *string = ...whatever...;
NSString *cutString = [string stringCutByWordsToMaxLength:100];

Get matched string from two NSArrays

How can I save the string that match from one NSArray with one index difference in NSMutableArray?
For example, there are three "apple", four "pineapple", six "banana", two "cocoa" and the rest of words dont have duplicate(s) in the nsarray, i would like to know if the nsarray has at least two same words. If yes, I would like to save "apple", "pineapple, "banana" and "cocoa" once in nsmutablearray. If there are other alike words, I would like to add them to namutablearray too.
My code (which still doesn't work properly);
NSArray *noWords = [[NSArray alloc] initWithArray:
[[NSString stringWithContentsOfFile:[[NSBundle mainBundle]
pathForResource:#"words" ofType:#"txt"]
encoding:NSUTF8StringEncoding error:NULL]
componentsSeparatedByString:#"\n"]];
NSUInteger scount = [noWords count];
int ii = 0;
NSString *stringline;
for (ii; ii < scount; ii++)
{
stringline = [noWords objectAtIndex:ii];
NSLog(#"stringline : %# ", stringline);
}
int i = 1;
NSString *line;
for (i ; i < 10; i++)
{
line = [noWords objectAtIndex:i];
NSLog (#"line : %# ", line);
NSMutableArray *douwords = [NSMutableArray array];
if ([stringline isEqualToString:line])
{
NSString *newword;
for (newword in douwords)
{
[douwords addObject:newword];
NSLog (#"detected! %# ", douwords);
}
}
}
Here's a solution using two sets:
- (NSArray *)getDuplicates:(NSArray *)words
{
NSMutableSet *dups = [NSMutableSet set],
*seen = [NSMutableSet set];
for (NSString *word in words) {
if ([seen containsObject:word]) {
[dups addObject:word];
}
[seen addObject:word];
}
return [dups allObjects];
}
Assuming NSSet uses hash tables behind the scenes (which I'm betting it does), this is going to be faster than the previously suggested O(n^2) solution.
Here's something off the top of my head:
NSMutableSet* duplicates = [NSMutableSet set];
NSArray* words = [NSArray arrayWithObjects:#"Apple", #"Apple", #"Orange", #"Apple", #"Orange", #"Pear", nil];
[words enumerateObjectsUsingBlock:^(NSString* str, NSUInteger idx, BOOL *stop) {
for (int i = idx + 1; i < words.count; i++) {
if ([str isEqualToString:[words objectAtIndex:i]]) {
[duplicates addObject:str];
break;
}
}
}];
NSLog(#"Dups: %#", [duplicates allObjects]); // Prints "Apple" and "Orange"
The use of an NSSet, as opposed to an NSArray, ensures strings are not added more than once. Obviously, there are optimizations that could be done, but it should be a good starting point.
I assume that you want to count appearances of words in your array and output those with a count of more than one. A basic and verbose way to do that would be:
// Make an array of words - some duplicates
NSArray *wordList = [[NSArray alloc] initWithObjects:
#"Apple", #"Banana", #"Pencil",
#"Steve Jobs", #"Kandahar",
#"Apple", #"Banana", #"Apple",
#"Pear", #"Pear", nil];
// Make an mutable dictionary - the key will be a word from the list
// and the value will be a number representing the number of times the
// word appears in the original array. It starts off empty.
NSMutableDictionary *wordCount = [[NSMutableDictionary alloc] init];
// In turn, take each word in the word list...
for (NSString *s in wordList) {
int count = 1;
// If the word is already in the dictionary
if([wordCount objectForKey:s]) {
// Increse the count by one
count = [[wordCount objectForKey:s] intValue] + 1;
}
// Save the word count in the dictionary
[wordCount setObject:[NSNumber numberWithInt:count] forKey:s];
}
// For each word...
for (NSString *s in [wordCount keysOfEntriesPassingTest:
^(id key, id obj, BOOL *stop) {
if ([obj intValue] > 1) return YES; else return NO;
}]) {
// print the word and the final count
NSLog(#"%2d %#", [[wordCount objectForKey:s] intValue], s);
}
The output would be:
3 Apple
2 Pear
2 Banana