iOS - Most efficient way to find word occurrence count in a string - objective-c

Given a string, I need to obtain a count of each word that appears in that string. To do so, I extracted the string into an array, by word, and searched that way, but I have the feeling that searching the string directly is more optimal. Below is the code that I originally wrote to solve the problem. I'm up for suggestions on better solutions though.
NSMutableDictionary *sets = [[NSMutableDictionary alloc] init];
NSString *paragraph = [[NSString alloc] initWithContentsOfFile:[[NSBundle mainBundle] pathForResource:#"text" ofType:#"txt"] encoding:NSUTF8StringEncoding error:NULL];
NSMutableArray *words = [[[paragraph lowercaseString] componentsSeparatedByString:#" "] mutableCopy];
while (words.count) {
NSMutableIndexSet *indexSet = [[NSMutableIndexSet alloc] init];
NSString *search = [words objectAtIndex:0];
for (unsigned i = 0; i < words.count; i++) {
if ([[words objectAtIndex:i] isEqualToString:search]) {
[indexSet addIndex:i];
}
}
[sets setObject:[NSNumber numberWithInt:indexSet.count] forKey:search];
[words removeObjectsAtIndexes:indexSet];
}
NSLog(#"%#", sets);
Example:
Starting string:
"This is a test. This is only a test."
Results:
"This" - 2
"is" - 2
"a" - 2
"test" - 2
"only" - 1

This is exactly what an NSCountedSet is for.
You need to break the string apart into words (which iOS is nice enough to give us a function for so that we don't have to worry about punctuation) and just add each of them to the counted set, which keeps track of the number of times each object appears in the set:
NSString *string = #"This is a test. This is only a test.";
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
// This block is called once for each word in the string.
[countedSet addObject:substring];
// If you want to ignore case, so that "this" and "This"
// are counted the same, use this line instead to convert
// each word to lowercase first:
// [countedSet addObject:[substring lowercaseString]];
}];
NSLog(#"%#", countedSet);
// Results: 2012-11-13 14:01:10.567 Testing App[35767:fb03]
// <NSCountedSet: 0x885df70> (a [2], only [1], test [2], This [2], is [2])

If I had to guess, I would say NSRegularExpression for that. Like this:
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
options:0
range:NSMakeRange(0, [string length])];
That snippet was taken from here.
Edit 1.0:
Based on what Sir Till said:
NSString *string = #"This is a test, so it is a test";
NSMutableDictionary *dictionary = [NSMutableDictionary dictionary];
NSArray *arrayOfWords = [string componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
for (NSString *word in arrayOfWords)
{
if ([dictionary objectForKey:word])
{
NSNumber *numberOfOccurences = [dictionary objectForKey:word];
NSNumber *increment = [NSNumber numberWithInt:(1 + [numberOfOccurences intValue])];
[dictionary setValue:increment forKey:word];
}
else
{
[dictionary setValue:[NSNumber numberWithInt:1] forKey:word];
}
}
You should be careful with:
Punctuation signs. (near other words)
UpperCase words vs lowerCase words.

I think that's really bad idea that you trying to search a words among the long paragraph with a loop. You should use a regular expression to do that! I know it's not easy at first time to learn it but it's really worth to know it! Take look at this case Use regular expression to find/replace substring in NSString

Related

Take all numbers separated by spaces from a string and place in an array

I have a NSString formatted like this:
"Hello world 12 looking for some 56"
I want to find all instances of numbers separated by whitespace and place them in an NSArray. I dont want to remove the numbers though.
Whats the best way of achieving this?
This is a solution using regular expression as suggested in the comment.
NSString *string = #"Hello world 12 looking for some 56";
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:#"\\b\\d+" options:nil error:nil];
NSArray *matches = [expression matchesInString:string options:nil range:(NSMakeRange(0, string.length))];
NSMutableArray *result = [[NSMutableArray alloc] init];
for (NSTextCheckingResult *match in matches) {
[result addObject:[string substringWithRange:match.range]];
}
NSLog(#"%#", result);
First make an array using NSString's componentsSeparatedByString method and take reference to this SO question. Then iterate the array and refer to this SO question to check if an array element is number: Checking if NSString is Integer.
I don't know where you are looking to do perform this action because it may not be fast (such as if it's being called in a table cell it may be choppy) based upon the string size.
Code:
+ (NSArray *)getNumbersFromString:(NSString *)str {
NSMutableArray *retVal = [NSMutableArray array];
NSCharacterSet *numericSet = [NSCharacterSet decimalDigitCharacterSet];
NSString *placeholder = #"";
unichar currentChar;
for (int i = [str length] - 1; i >= 0; i--) {
currentChar = [str characterAtIndex:i];
if ([numericSet characterIsMember:currentChar]) {
placeholder = [placeholder stringByAppendingString:
[NSString stringWithCharacters:&currentChar
length:[placeholder length]+1];
} else {
if ([placeholder length] > 0) [retVal addObject:[placeholder intValue]];
else placeholder = #"";
return [retVal copy];
}
To explain what is happening above, essentially I am,
going through every character until I find a number
adding that number including any numbers after to a string
once it finds a number it adds it to an array
Hope this helps please ask for clarification if needed

Split string into parts

I want to split NSString into array with fixed-length parts. How can i do this?
I searched about it, but i only find componentSeparatedByString method, but nothing more. It's also can be done manually, but is there a faster way to do this ?
Depends what you mean by "faster" - if it is processor performance you refer to, I'd guess that it is hard to beat substringWithRange:, but for robust, easy coding of a problem like this, regular expressions can actually come in quite handy.
Here's one that can be used to divide a string into 10-char chunks, allowing the last chunk to be of less than 10 chars:
NSString *pattern = #".{1,10}";
Unfortunately, the Cocoa implementation of the regex machinery is less elegant, but simple enough to use:
NSString *string = #"I want to split NSString into array with fixed-length parts. How can i do this?";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern: pattern options: 0 error: &error];
NSArray *matches = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
NSMutableArray *result = [NSMutableArray array];
for (NSTextCheckingResult *match in matches) {
[result addObject: [string substringWithRange: match.range]];
}
Break the string into a sequence of NSRanges and then try using NSString's substringWithRange: method.
You can split a string in different ways.
One way is to split by spaces(or any character):
NSString *string = #"Hello World Obj C is Awesome";
NSArray *words = [string componentsSeparatedByString:#" "];
You can also split at exact points in a string:
NSString *word = [string substringWithRange:NSMakeRange(startPoint, FIXED_LENGTH)];
Simply put it in a loop for a fixed length and save to Mutable Array:
NSMutableArray *words = [NSMutableArray array];
for (int i = 0; i < [string length]; i++) {
NSString *word = [string substringWithRange:NSMakeRange(i, FIXED_LENGTH)]; //you may want to make #define
[array addObject:word];
}
Hope this helps.

Replace specific words in NSString

what is the best way to get and replace specific words in string ?
for example I have
NSString * currentString = #"one {two}, thing {thing} good";
now I need find each {currentWord}
and apply function for it
[self replaceWord:currentWord]
then replace currentWord with result from function
-(NSString*)replaceWord:(NSString*)currentWord;
The following example shows how you can use NSRegularExpression and enumerateMatchesInString to accomplish the task. I have just used uppercaseString as function that replaces a word, but you can use your replaceWord method as well:
EDIT: The first version of my answer did not work correctly if the replaced words are
shorter or longer as the original words (thanks to Fabian Kreiser for noting that!) .
Now it should work correctly in all cases.
NSString *currentString = #"one {two}, thing {thing} good";
// Regular expression to find "word characters" enclosed by {...}:
NSRegularExpression *regex;
regex = [NSRegularExpression regularExpressionWithPattern:#"\\{(\\w+)\\}"
options:0
error:NULL];
NSMutableString *modifiedString = [currentString mutableCopy];
__block int offset = 0;
[regex enumerateMatchesInString:currentString
options:0
range:NSMakeRange(0, [currentString length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
// range = location of the regex capture group "(\\w+)" in currentString:
NSRange range = [result rangeAtIndex:1];
// Adjust location for modifiedString:
range.location += offset;
// Get old word:
NSString *oldWord = [modifiedString substringWithRange:range];
// Compute new word:
// In your case, that would be
// NSString *newWord = [self replaceWord:oldWord];
NSString *newWord = [NSString stringWithFormat:#"--- %# ---", [oldWord uppercaseString] ];
// Replace new word in modifiedString:
[modifiedString replaceCharactersInRange:range withString:newWord];
// Update offset:
offset += [newWord length] - [oldWord length];
}
];
NSLog(#"%#", modifiedString);
Output:
one {--- TWO ---}, thing {--- THING ---} good

Extracting sentences containing keywords objective c

I have a block of text (a newspaper article if it's of any relevance) was wondering if there is a way to extract all sentences containing a particular keyword in objective-c? I've been looking a bit at ParseKit but aren't having much luck!
You can enumerate sentences using native NSString methods like this...
NSString *string = #"your text";
NSMutableArray *sentences = [NSMutableArray array];
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationBySentences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
//check that this sentence has the string you are looking for
NSRange range = [substring rangeOfString:#"The text you are looking for"];
if (range.location != NSNotFound) {
[sentences addObject:substring];
}
}];
for (NSString *sentence in sentences) {
NSLog(#"%#", sentence);
}
At the end you will have an array of sentences all containing the text you were looking for.
Edit: As noted in the comments there are some inherit weaknesses with my solution as it requires a perfectly formatted sentence where period + space is only used when actually ending sentences... I'll leave it in here as it could be viable for people sorting a text with another (known) separator.
Here's another way of achieving what you want:
NSString *wordYouAreLookingFor = #"happy";
NSArray *arrayOfSentences = [aString componentsSeparatedByString:#". "]; // get the single sentences
NSMutableArray *sentencesWithMatchingWord = [[NSMutableArray alloc] init];
for (NSString *singleSentence in arrayOfSentences) {
NSInteger originalSize = [singleSentence length];
NSString *possibleNewString = [singleSentence stringByReplacingOccurrencesOfString:wordYouAreLookingFor withString:#""];
if (originalSize != [possibleNewString length]) {
[sentencesWithMatchingWord addObject:singleSentence];
}
}

Objective-C Find the most commonly used words in an NSString

I am trying to write a method:
- (NSDictionary *)wordFrequencyFromString:(NSString *)string {}
where the dictionary returned will have the words and how often they were used in the string provided. Unfortunately, I can't seem to find a way to iterate through words in a string to analyze each one - only each character which seems like a bit more work than necessary. Any suggestions?
NSString has -enumerateSubstringsInRange: method which allows to enumerate all words directly, letting standard api to do all necessary stuff to define word boundaries etc:
[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
options:NSStringEnumerationByWords
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"%#", substring);
}];
In the enumeration block you can use either NSDictionary with words as keys and NSNumber as their counts, or use NSCountedSet that provides required functionality for counts.
You can use componentsSeparatedByCharactersInSet: to split the string and NSCountedSet will count the words for you.
1) Split the string into words using a combination of the punctuation, whitespace and new line character sets:
NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];
[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSArray *words = [myString componentsSeparatedByCharactersInSet:separators];
2) Count the occurrences of the words (if you want to disregard capitalization, you can do NSString *myString = [originalString lowercaseString]; before splitting the string into components):
NSCountedSet *frequencies = [NSCountedSet setWithArray:words];
NSUInteger aWordCount = [frequencies countForObject:#"word"]);
If you are willing to change your method signature, you can just return the counted set.
Split the string into an array of words using -[NSString componentsSeparatedByCharactersInSet:] first. (Use [[NSCharacterSet letterCharacterSet] invertedSet] as the argument to split on all non-letter characters.)
I used following approach for getting most common word from NSString.
-(void)countMostFrequentWordInSpeech:(NSString*)speechString
{
NSString *string = speechString;
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
[countedSet addObject:substring];
}];
// NSLog(#"%#", countedSet);
//Sort CountedSet & get most frequent common word at 0th index of resultant array
NSMutableArray *dictArray = [NSMutableArray array];
[countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
[dictArray addObject:#{#"object": obj,
#"count": #([countedSet countForObject:obj])}];
}];
NSArray *sortedArrayOfWord= [dictArray sortedArrayUsingDescriptors:#[[NSSortDescriptor sortDescriptorWithKey:#"count" ascending:NO]]];
if (sortedArrayOfWord.count>0)
{
self.mostFrequentWordLabel.text=[NSString stringWithFormat:#"Frequent Word: %#", [[sortedArrayOfWord[0] valueForKey:#"object"] capitalizedString]];
}
}
"speechString" is my string from which I have to get most frequent/common words. Object at 0th index of array "sortedArrayOfWord" would be most common word.