I have an array of strings from an online database that I trying to determine the most commonly used word. The values inside the arrays will vary but I want to check the most common words of whatever collection or words I'm using. If theoretically I had an array of the following...
NSArray *stringArray = [NSArray arrayWithObjects:#"Duck", #"Duck", #"Duck", #"Duck", #"Goose"];
How do I iterate through this array to determine the most common string, which would obviously be "Duck"?
Simplest way is probably NSCountedSet:
NSCountedSet* stringSet = [[NSCountedSet alloc] initWithArray:strings];
NSString* mostCommon = nil;
NSUInteger highestCount = 0;
for(NSString* string in stringSet) {
NSUInteger count = [stringSet countForObject:string];
if(count > highestCount) {
highestCount = count;
mostCommon = string;
}
}
You can use the word as a key into a dictionary.
NSMutableDictionary *words = [NSMutableDictionary dictionary];
for (NSString *word in stringArray) {
if (!words[word]) {
[words setValue:[NSDecimalNumber zero] forKey:word];
}
words[word] = [words[word] decimalNumberByAdding:[NSDecimalNumber one]];
}
Now iterate through words and find the key with the highest value.
NSString *mostCommon;
NSDecimalNumber *curMax = [NSDecimalNumber zero];
for (NSString *key in [words allKeys]) {
if ([words[key] compare:curMax] == NSOrderedDescending) {
mostCommon = key;
curMax = word[key];
}
}
NSLog(#"Most Common Word: %#", mostCommon);
EDIT: Rather than looping through the array once then looping separately through the sorted dictionary, I think we can do better and do it all in a single loop.
NSString *mostCommon;
NSDecimalNumber *curMax = [NSDecimalNumber zero];
NSMutableDictionary *words = [NSMutableDictionary dictionary];
for (NSString *word in stringArray) {
if (!words[word]) {
[words setValue:[NSDecimalNumber zero] forKey:word];
}
words[word] = [words[word] decimalNumberByAdding:[NSDecimalNumber one]];
if ([words[word] compare:curMax] == NSOrderedDescending) {
mostCommon = word;
curMax = words[word];
}
}
NSLog(#"Most Common Word: %#", mostCommon);
This should be significantly faster than my answer pre-edit, though I don't know how it compares to using the NSCountedSet answer.
Try using NSPredicate.
NSUInteger count=0;
NSString *mostCommonStr;
for(NSString *strValue in stringArray) {
NSUInteger countStr=[[stringArray filteredArrayUsingPredicate:[NSPredicate predicateWithFormat:#"self MATCHES[CD] %#, strValue]]count];
if(countStr > count) {
count=countStr;
mostCommonStr=strValue;
}
}
NSLog(#"The most commonstr is %#",mostCommonStr);
Related
I'v been trying to split string to array of components by number, but have no idea how to do it. I know that each components lenght is 9 except the last one. But there is no separation between them. Maybe anyone would know how could i make this split possible?
string : E44000000R33000444V33441
And i'd like to get array with: E44000000 R33000444 V33441
in past I'v used this method, but i guess there should be a way to separate by constant number. Any ideas
NSArray *myWords = [message componentsSeparatedByString:#";"];
Please try the below code.
NSString *stringTest = #"E44000000R33000444V33441323";
NSMutableArray *arrayTest = [NSMutableArray array];
while([stringTest length] > 8) {
[arrayTest addObject:[NSString stringWithString:[stringTest substringToIndex:9]]];
stringTest = [stringTest substringFromIndex:9];
}
NSLog(#"arrayTest - %#", arrayTest);
Try this one..
NSString *mainString=#"E44000000R33000444V";
NSMutableArray *brokenString=[NSMutableArray new];
int start=0;
for (; start<mainString.length-9; start+=9) {
[brokenString addObject:[mainString substringWithRange:NSMakeRange(start, 9)]];
}
[brokenString addObject:[mainString substringFromIndex:start]];
NSLog(#"->%#",brokenString);
Output is :
->(
E44000000,
R33000444,
V
)
I investigated the NSString, and i didn't found any function like that. But you can create a category of NSString and put this function in that category and you can use as a NSString instance method.
- (NSArray *) componentSaparetedByLength:(NSUInteger) length{
NSMutableArray *array = [NSMutableArray new];
NSRange range = NSMakeRange(0, length);
NSString *subString = nil;
while (range.location + range.length <= self.length) {
subString = [self substringWithRange:range];
[array addObject:subString];
//Edit
range.location = range.length + range.location;
//Edit
range.length = length;
}
if(range.location<self.length){
subString = [self substringFromIndex:range.location];
[array addObject:subString];
}
return array;
}
You can get the substring upto the characters which you want in a loop(string length) & pass the next index for getting the next substring. After getting each substring you can add it to the array.
Used SubstringToIndex & SubstringFromIndex functions to get the substring.
Also not an requirement here, I want to propose a solution that is capable of handling characters from more sophisticated script systems, like surrogate pairs, base characters plus combining marks, Hangul jamo, and Indic consonant clusters.
#interface NSString (Split)
-(NSArray *)arrayBySplittingWithMaximumSize:(NSUInteger)size
options:(NSStringEnumerationOptions) option;
#end
#implementation NSString (Split)
-(NSArray *)arrayBySplittingWithMaximumSize:(NSUInteger)size
options:(NSStringEnumerationOptions) option
{
NSMutableArray *letterArray = [NSMutableArray array];
[self enumerateSubstringsInRange:NSMakeRange(0, [self length])
options:(option)
usingBlock:^(NSString *substring,
NSRange substringRange,
NSRange enclosingRange,
BOOL *stop) {
[letterArray addObject:substring];
}];
NSMutableArray *array = [NSMutableArray array];
[letterArray enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
if (idx%size == 0) {
[array addObject: [NSMutableString stringWithCapacity:size]];
}
NSMutableString *string = [array objectAtIndex:[array count]-1];
[string appendString:obj];
}];
return array;
}
#end
usage
NSArray *array = [#"E44000000R33000444V33441" arraysBySplittingWithMaximumSize:9
options:NSStringEnumerationByComposedCharacterSequences];
results in:
(
E44000000,
R33000444,
V33441
)
I'm trying to re-arrange words into alphabetical order. For example, tomato would become amoott, or stack would become ackst.
I've found some methods to do this in C with char arrays, but I'm having issues getting that to work within the confines of the NSString object.
Is there an easier way to do it within the NSString object itself?
You could store each of the string's characters into an NSArray of NSNumber objects and then sort that. Seems a bit expensive, so I would perhaps just use qsort() instead.
Here it's provided as an Objective-C category (untested):
NSString+SortExtension.h:
#import <Foundation/Foundation.h>
#interface NSString (SortExtension)
- (NSString *)sorted;
#end
NSString+SortExtension.m:
#import "NSString+SortExtension.h"
#implementation NSString (SortExtension)
- (NSString *)sorted
{
// init
NSUInteger length = [self length];
unichar *chars = (unichar *)malloc(sizeof(unichar) * length);
// extract
[self getCharacters:chars range:NSMakeRange(0, length)];
// sort (for western alphabets only)
qsort_b(chars, length, sizeof(unichar), ^(const void *l, const void *r) {
unichar left = *(unichar *)l;
unichar right = *(unichar *)r;
return (int)(left - right);
});
// recreate
NSString *sorted = [NSString stringWithCharacters:chars length:length];
// clean-up
free(chars);
return sorted;
}
#end
I think separate the string to an array of string(each string in the array contains only one char from the original string). Then sort the array will be OK. This is not efficient but is enough when the string is not very long. I've tested the code.
NSString *str = #"stack";
NSMutableArray *charArray = [NSMutableArray arrayWithCapacity:str.length];
for (int i=0; i<str.length; ++i) {
NSString *charStr = [str substringWithRange:NSMakeRange(i, 1)];
[charArray addObject:charStr];
}
NSString *sortedStr = [[charArray sortedArrayUsingSelector:#selector(localizedCaseInsensitiveCompare:)] componentsJoinedByString:#""];
// --------- Function To Make an Array from String
NSArray *makeArrayFromString(NSString *my_string) {
NSMutableArray *array = [[NSMutableArray alloc] init];
for (int i = 0; i < my_string.length; i ++) {
[array addObject:[NSString stringWithFormat:#"%c", [my_string characterAtIndex:i]]];
}
return array;
}
// --------- Function To Sort Array
NSArray *sortArrayAlphabetically(NSArray *my_array) {
my_array= [my_array sortedArrayUsingSelector:#selector(localizedCaseInsensitiveCompare:)];
return my_array;
}
// --------- Function Combine Array To Single String
NSString *combineArrayIntoString(NSArray *my_array) {
NSString * combinedString = [[my_array valueForKey:#"description"] componentsJoinedByString:#""];
return combinedString;
}
// Now you can call the functions as in below where string_to_arrange is your string
NSArray *blowUpArray;
blowUpArray = makeArrayFromString(string_to_arrange);
blowUpArray = sortArrayAlphabetically(blowUpArray);
NSString *arrayToString= combineArrayIntoString(blowUpArray);
NSLog(#"arranged string = %#",arrayToString);
Just another example using NSMutableString and sortUsingComparator:
NSMutableString *mutableString = [[NSMutableString alloc] initWithString:#"tomat"];
[mutableString appendString:#"o"];
NSLog(#"Orignal string: %#", mutableString);
NSMutableArray *charArray = [NSMutableArray array];
for (int i = 0; i < mutableString.length; ++i) {
[charArray addObject:[NSNumber numberWithChar:[mutableString characterAtIndex:i]]];
}
[charArray sortUsingComparator:^NSComparisonResult(id _Nonnull obj1, id _Nonnull obj2) {
if ([obj1 charValue] < [obj2 charValue]) return NSOrderedAscending;
return NSOrderedDescending;
}];
[mutableString setString:#""];
for (int i = 0; i < charArray.count; ++i) {
[mutableString appendFormat:#"%c", [charArray[i] charValue]];
}
NSLog(#"Sorted string: %#", mutableString);
Output:
Orignal string: tomato
Sorted string: amoott
So pretty much I want to check if my NSString from my NSArray is a substring of my string named imageName.
So lets say this:
My Image name is: picture5of-batman.png
My Array contains strings and one of them is: Batman
So pretty much I want to eliminate the: picture5of- part of the image name and replace it with the NSString from the NSArray.
This is how I try to do it but it never makes it to the if statement. And no my Array is not nil either. Here is the code:
for (NSString *string in superheroArray) {
if ([string rangeOfString:imageName].location != NSNotFound) {
//Ok so some string in superheroArray is equal to the file name of the image
imageName = [imageName stringByReplacingOccurrencesOfString:#"" withString:string
options:NSCaseInsensitiveSearch range:NSMakeRange(0, string.length)];
}
}
Edit1: This still does not work
for (NSString *string in superheroArray) {
if ([imageName rangeOfString:string options:NSCaseInsensitiveSearch].location != NSNotFound) {
//Ok so some string in superheroArray is equal to the file name of the image
imageName = string;
//HOW ABOUT THAT FOR EFFICIENCY :P
}
}
[imageName rangeOfString:string options: NSCaseInsensitiveSearch]
I don't see why it's not working in your code, maybe split the NSString stuff from the NSRage test.
but this work here :
NSArray *ar = [NSArray arrayWithObjects:#"Batman", #"Maurice", nil];
__block NSString *imageName = #"picture5of-batman.png";
__block NSUInteger theIndex = -1;
[ar enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
NSRange r = [imageName rangeOfString: obj
options: NSCaseInsensitiveSearch];
if (r.location != NSNotFound)
{
theIndex = idx;
NSString *str = [imageName pathExtension];
imageName = [(NSString *)obj stringByAppendingPathExtension:str];
// you found it, so you can stop now
*stop = YES;
}
}];
if (theIndex != -1)
{
NSLog(#"The index is : %d and new imageName == %#", theIndex, imageName);
}
And here is the NSLog statement :
2011-12-10 23:04:28.967 testSwitch1[2493:207] The index is : 0 and new imageName == Batman.png
How can I save the string that match from one NSArray with one index difference in NSMutableArray?
For example, there are three "apple", four "pineapple", six "banana", two "cocoa" and the rest of words dont have duplicate(s) in the nsarray, i would like to know if the nsarray has at least two same words. If yes, I would like to save "apple", "pineapple, "banana" and "cocoa" once in nsmutablearray. If there are other alike words, I would like to add them to namutablearray too.
My code (which still doesn't work properly);
NSArray *noWords = [[NSArray alloc] initWithArray:
[[NSString stringWithContentsOfFile:[[NSBundle mainBundle]
pathForResource:#"words" ofType:#"txt"]
encoding:NSUTF8StringEncoding error:NULL]
componentsSeparatedByString:#"\n"]];
NSUInteger scount = [noWords count];
int ii = 0;
NSString *stringline;
for (ii; ii < scount; ii++)
{
stringline = [noWords objectAtIndex:ii];
NSLog(#"stringline : %# ", stringline);
}
int i = 1;
NSString *line;
for (i ; i < 10; i++)
{
line = [noWords objectAtIndex:i];
NSLog (#"line : %# ", line);
NSMutableArray *douwords = [NSMutableArray array];
if ([stringline isEqualToString:line])
{
NSString *newword;
for (newword in douwords)
{
[douwords addObject:newword];
NSLog (#"detected! %# ", douwords);
}
}
}
Here's a solution using two sets:
- (NSArray *)getDuplicates:(NSArray *)words
{
NSMutableSet *dups = [NSMutableSet set],
*seen = [NSMutableSet set];
for (NSString *word in words) {
if ([seen containsObject:word]) {
[dups addObject:word];
}
[seen addObject:word];
}
return [dups allObjects];
}
Assuming NSSet uses hash tables behind the scenes (which I'm betting it does), this is going to be faster than the previously suggested O(n^2) solution.
Here's something off the top of my head:
NSMutableSet* duplicates = [NSMutableSet set];
NSArray* words = [NSArray arrayWithObjects:#"Apple", #"Apple", #"Orange", #"Apple", #"Orange", #"Pear", nil];
[words enumerateObjectsUsingBlock:^(NSString* str, NSUInteger idx, BOOL *stop) {
for (int i = idx + 1; i < words.count; i++) {
if ([str isEqualToString:[words objectAtIndex:i]]) {
[duplicates addObject:str];
break;
}
}
}];
NSLog(#"Dups: %#", [duplicates allObjects]); // Prints "Apple" and "Orange"
The use of an NSSet, as opposed to an NSArray, ensures strings are not added more than once. Obviously, there are optimizations that could be done, but it should be a good starting point.
I assume that you want to count appearances of words in your array and output those with a count of more than one. A basic and verbose way to do that would be:
// Make an array of words - some duplicates
NSArray *wordList = [[NSArray alloc] initWithObjects:
#"Apple", #"Banana", #"Pencil",
#"Steve Jobs", #"Kandahar",
#"Apple", #"Banana", #"Apple",
#"Pear", #"Pear", nil];
// Make an mutable dictionary - the key will be a word from the list
// and the value will be a number representing the number of times the
// word appears in the original array. It starts off empty.
NSMutableDictionary *wordCount = [[NSMutableDictionary alloc] init];
// In turn, take each word in the word list...
for (NSString *s in wordList) {
int count = 1;
// If the word is already in the dictionary
if([wordCount objectForKey:s]) {
// Increse the count by one
count = [[wordCount objectForKey:s] intValue] + 1;
}
// Save the word count in the dictionary
[wordCount setObject:[NSNumber numberWithInt:count] forKey:s];
}
// For each word...
for (NSString *s in [wordCount keysOfEntriesPassingTest:
^(id key, id obj, BOOL *stop) {
if ([obj intValue] > 1) return YES; else return NO;
}]) {
// print the word and the final count
NSLog(#"%2d %#", [[wordCount objectForKey:s] intValue], s);
}
The output would be:
3 Apple
2 Pear
2 Banana
How can I optimise out this nested for loop?
The program should go through each word in the array created from the word text file, and if it's greater than 8 characters, add it to the goodWords array. But the caveat is that I only want the root word to be in the goodWords array, for example:
If greet is added to the array, I don't want greets or greetings or greeters, etc.
NSString *string = [NSString stringWithContentsOfFile:#"/Users/james/dev/WordParser/word.txt" encoding:NSUTF8StringEncoding error:NULL];
NSArray *words = [string componentsSeparatedByString:#"\r\n"];
NSMutableArray *goodWords = [NSMutableArray array];
BOOL shouldAddToGoodWords = YES;
for (NSString *word in words)
{
NSLog(#"Word: %#", word);
if ([word length] > 8)
{
NSLog(#"Word is greater than 8");
for (NSString *existingWord in [goodWords reverseObjectEnumerator])
{
NSLog(#"Existing Word: %#", existingWord);
if ([word rangeOfString:existingWord].location != NSNotFound)
{
NSLog(#"Not adding...");
shouldAddToGoodWords = NO;
break;
}
}
if (shouldAddToGoodWords)
{
NSLog(#"Adding word: %#", word);
[goodWords addObject:word];
}
}
shouldAddToGoodWords = YES;
}
How about something like this?
//load the words from wherever
NSString * allWords = [NSString stringWithContentsOfFile:#"/usr/share/dict/words"];
//create a mutable array of the words
NSMutableArray * words = [[allWords componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]] mutableCopy];
//remove any words that are shorter than 8 characters
[words filterUsingPredicate:[NSPredicate predicateWithFormat:#"length >= 8"]];
//sort the words in ascending order
[words sortUsingSelector:#selector(caseInsensitiveCompare:)];
//create a set of indexes (these will be the non-root words)
NSMutableIndexSet * badIndexes = [NSMutableIndexSet indexSet];
//remember our current root word
NSString * currentRoot = nil;
NSUInteger count = [words count];
//loop through the words
for (NSUInteger i = 0; i < count; ++i) {
NSString * word = [words objectAtIndex:i];
if (currentRoot == nil) {
//base case
currentRoot = word;
} else if ([word hasPrefix:currentRoot]) {
//word is a non-root word. remember this index to remove it later
[badIndexes addIndex:i];
} else {
//no match. this word is our new root
currentRoot = word;
}
}
//remove the non-root words
[words removeObjectsAtIndexes:badIndexes];
NSLog(#"%#", words);
[words release];
This runs very very quickly on my machine (2.8GHz MBP).
A Trie seems suitable for your purpose. It is like a hash, and is useful for detecting if a given string is a prefix of an already seen string.
I used an NSSet to ensure that you only have 1 copy of a word added at a time. It will add a word if the NSSet does not already contain it. It then checks to see if the new word is a substring for any word that has already been added, if true then it won't add the new word. It's case-insensitive as well.
What I've written is a refactoring of your code. It's probably not that much faster but you really do want a tree data structure if you want to make it a lot faster when you want to search for words that have already been added to your tree.
Take a look at RedBlack Trees or B-Trees.
Words.txt
objective
objectively
cappucin
cappucino
cappucine
programme
programmer
programmatic
programmatically
Source Code
- (void)addRootWords {
NSString *textFile = [[NSBundle mainBundle] pathForResource:#"words" ofType:#"txt"];
NSString *string = [NSString stringWithContentsOfFile:textFile encoding:NSUTF8StringEncoding error:NULL];
NSArray *wordFile = [string componentsSeparatedByString:#"\n"];
NSMutableSet *goodWords = [[NSMutableSet alloc] init];
for (NSString *newWord in wordFile)
{
NSLog(#"Word: %#", newWord);
if ([newWord length] > 8)
{
NSLog(#"Word '%#' contains 8 or more characters", newWord);
BOOL shouldAddWord = NO;
if ( [goodWords containsObject:newWord] == NO) {
shouldAddWord = YES;
}
for (NSString *existingWord in goodWords)
{
NSRange textRange = [[newWord lowercaseString] rangeOfString:[existingWord lowercaseString]];
if( textRange.location != NSNotFound ) {
// newWord contains the a substring of existingWord
shouldAddWord = NO;
break;
}
NSLog(#"(word:%#) does not contain (substring:%#)", newWord, existingWord);
shouldAddWord = YES;
}
if (shouldAddWord) {
NSLog(#"Adding word: %#", newWord);
[goodWords addObject:newWord];
}
}
}
NSLog(#"***Added words***");
int count = 1;
for (NSString *word in goodWords) {
NSLog(#"%d: %#", count, word);
count++;
}
[goodWords release];
}
Output:
***Added words***
1: cappucino
2: programme
3: objective
4: programmatic
5: cappucine