Objective-C Find the most commonly used words in an NSString - objective-c

I am trying to write a method:
- (NSDictionary *)wordFrequencyFromString:(NSString *)string {}
where the dictionary returned will have the words and how often they were used in the string provided. Unfortunately, I can't seem to find a way to iterate through words in a string to analyze each one - only each character which seems like a bit more work than necessary. Any suggestions?

NSString has -enumerateSubstringsInRange: method which allows to enumerate all words directly, letting standard api to do all necessary stuff to define word boundaries etc:
[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
options:NSStringEnumerationByWords
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"%#", substring);
}];
In the enumeration block you can use either NSDictionary with words as keys and NSNumber as their counts, or use NSCountedSet that provides required functionality for counts.

You can use componentsSeparatedByCharactersInSet: to split the string and NSCountedSet will count the words for you.
1) Split the string into words using a combination of the punctuation, whitespace and new line character sets:
NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];
[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSArray *words = [myString componentsSeparatedByCharactersInSet:separators];
2) Count the occurrences of the words (if you want to disregard capitalization, you can do NSString *myString = [originalString lowercaseString]; before splitting the string into components):
NSCountedSet *frequencies = [NSCountedSet setWithArray:words];
NSUInteger aWordCount = [frequencies countForObject:#"word"]);
If you are willing to change your method signature, you can just return the counted set.

Split the string into an array of words using -[NSString componentsSeparatedByCharactersInSet:] first. (Use [[NSCharacterSet letterCharacterSet] invertedSet] as the argument to split on all non-letter characters.)

I used following approach for getting most common word from NSString.
-(void)countMostFrequentWordInSpeech:(NSString*)speechString
{
NSString *string = speechString;
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
[countedSet addObject:substring];
}];
// NSLog(#"%#", countedSet);
//Sort CountedSet & get most frequent common word at 0th index of resultant array
NSMutableArray *dictArray = [NSMutableArray array];
[countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
[dictArray addObject:#{#"object": obj,
#"count": #([countedSet countForObject:obj])}];
}];
NSArray *sortedArrayOfWord= [dictArray sortedArrayUsingDescriptors:#[[NSSortDescriptor sortDescriptorWithKey:#"count" ascending:NO]]];
if (sortedArrayOfWord.count>0)
{
self.mostFrequentWordLabel.text=[NSString stringWithFormat:#"Frequent Word: %#", [[sortedArrayOfWord[0] valueForKey:#"object"] capitalizedString]];
}
}
"speechString" is my string from which I have to get most frequent/common words. Object at 0th index of array "sortedArrayOfWord" would be most common word.

Related

Split string into parts

I want to split NSString into array with fixed-length parts. How can i do this?
I searched about it, but i only find componentSeparatedByString method, but nothing more. It's also can be done manually, but is there a faster way to do this ?
Depends what you mean by "faster" - if it is processor performance you refer to, I'd guess that it is hard to beat substringWithRange:, but for robust, easy coding of a problem like this, regular expressions can actually come in quite handy.
Here's one that can be used to divide a string into 10-char chunks, allowing the last chunk to be of less than 10 chars:
NSString *pattern = #".{1,10}";
Unfortunately, the Cocoa implementation of the regex machinery is less elegant, but simple enough to use:
NSString *string = #"I want to split NSString into array with fixed-length parts. How can i do this?";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern: pattern options: 0 error: &error];
NSArray *matches = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
NSMutableArray *result = [NSMutableArray array];
for (NSTextCheckingResult *match in matches) {
[result addObject: [string substringWithRange: match.range]];
}
Break the string into a sequence of NSRanges and then try using NSString's substringWithRange: method.
You can split a string in different ways.
One way is to split by spaces(or any character):
NSString *string = #"Hello World Obj C is Awesome";
NSArray *words = [string componentsSeparatedByString:#" "];
You can also split at exact points in a string:
NSString *word = [string substringWithRange:NSMakeRange(startPoint, FIXED_LENGTH)];
Simply put it in a loop for a fixed length and save to Mutable Array:
NSMutableArray *words = [NSMutableArray array];
for (int i = 0; i < [string length]; i++) {
NSString *word = [string substringWithRange:NSMakeRange(i, FIXED_LENGTH)]; //you may want to make #define
[array addObject:word];
}
Hope this helps.

Is there a simple way to split a NSString into an array of characters?

Is there a simple way to split a NSString into an array of characters? It would actually be best if the resulting type were a collection of NSString's themselves, just one character each.
Yes, I know I can do this in a loop, but I'm wondering if there is a faster way to do this with any existing methods or functions the way you can with LINQ in C#.
e.g.
// I have this...
NSString * fooString = #"Hello";
// And want this...
NSArray * fooChars; // <-- Contains the NSStrings, #"H", #"e", #"l", #"l" and #"o"
You could do something like this (if you want to use enumerators)
NSString *fooString = #"Hello";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[fooString length]];
[fooString enumerateSubstringsInRange:NSMakeRange(0, fooString.length)
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
[characters addObject:substring];
}];
And if you really wanted it in an NSArray finally
NSArray *fooChars = [NSArray arrayWithArray:characters];
Be sure to care about that some characters like emoji and others may span a longer range than just one index.
Here's a category method for NSString
#implementation (SplitString)
- (NSArray *)splitString
{
NSUInteger index = 0;
NSMutableArray *array = [NSMutableArray arrayWithCapacity:self.length];
while (index < self.length) {
NSRange range = [self rangeOfComposedCharacterSequenceAtIndex:index];
NSString *substring = [self substringWithRange:range];
[array addObject:substring];
index = range.location + range.length;
}
return array;
}
#end
convert it to NSData the [data bytes] will have a C string in the encoding that you pick [data length] bytes long.
Try this
NSMutableArray *array = [NSMutableArray array];
NSString *str = #"Hello";
for (int i = 0; i < [str length]; i++) {
NSString *ch = [str substringWithRange:NSMakeRange(i, 1)];
[array addObject:ch];
}

Extracting sentences containing keywords objective c

I have a block of text (a newspaper article if it's of any relevance) was wondering if there is a way to extract all sentences containing a particular keyword in objective-c? I've been looking a bit at ParseKit but aren't having much luck!
You can enumerate sentences using native NSString methods like this...
NSString *string = #"your text";
NSMutableArray *sentences = [NSMutableArray array];
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationBySentences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
//check that this sentence has the string you are looking for
NSRange range = [substring rangeOfString:#"The text you are looking for"];
if (range.location != NSNotFound) {
[sentences addObject:substring];
}
}];
for (NSString *sentence in sentences) {
NSLog(#"%#", sentence);
}
At the end you will have an array of sentences all containing the text you were looking for.
Edit: As noted in the comments there are some inherit weaknesses with my solution as it requires a perfectly formatted sentence where period + space is only used when actually ending sentences... I'll leave it in here as it could be viable for people sorting a text with another (known) separator.
Here's another way of achieving what you want:
NSString *wordYouAreLookingFor = #"happy";
NSArray *arrayOfSentences = [aString componentsSeparatedByString:#". "]; // get the single sentences
NSMutableArray *sentencesWithMatchingWord = [[NSMutableArray alloc] init];
for (NSString *singleSentence in arrayOfSentences) {
NSInteger originalSize = [singleSentence length];
NSString *possibleNewString = [singleSentence stringByReplacingOccurrencesOfString:wordYouAreLookingFor withString:#""];
if (originalSize != [possibleNewString length]) {
[sentencesWithMatchingWord addObject:singleSentence];
}
}

iOS - Most efficient way to find word occurrence count in a string

Given a string, I need to obtain a count of each word that appears in that string. To do so, I extracted the string into an array, by word, and searched that way, but I have the feeling that searching the string directly is more optimal. Below is the code that I originally wrote to solve the problem. I'm up for suggestions on better solutions though.
NSMutableDictionary *sets = [[NSMutableDictionary alloc] init];
NSString *paragraph = [[NSString alloc] initWithContentsOfFile:[[NSBundle mainBundle] pathForResource:#"text" ofType:#"txt"] encoding:NSUTF8StringEncoding error:NULL];
NSMutableArray *words = [[[paragraph lowercaseString] componentsSeparatedByString:#" "] mutableCopy];
while (words.count) {
NSMutableIndexSet *indexSet = [[NSMutableIndexSet alloc] init];
NSString *search = [words objectAtIndex:0];
for (unsigned i = 0; i < words.count; i++) {
if ([[words objectAtIndex:i] isEqualToString:search]) {
[indexSet addIndex:i];
}
}
[sets setObject:[NSNumber numberWithInt:indexSet.count] forKey:search];
[words removeObjectsAtIndexes:indexSet];
}
NSLog(#"%#", sets);
Example:
Starting string:
"This is a test. This is only a test."
Results:
"This" - 2
"is" - 2
"a" - 2
"test" - 2
"only" - 1
This is exactly what an NSCountedSet is for.
You need to break the string apart into words (which iOS is nice enough to give us a function for so that we don't have to worry about punctuation) and just add each of them to the counted set, which keeps track of the number of times each object appears in the set:
NSString *string = #"This is a test. This is only a test.";
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
// This block is called once for each word in the string.
[countedSet addObject:substring];
// If you want to ignore case, so that "this" and "This"
// are counted the same, use this line instead to convert
// each word to lowercase first:
// [countedSet addObject:[substring lowercaseString]];
}];
NSLog(#"%#", countedSet);
// Results: 2012-11-13 14:01:10.567 Testing App[35767:fb03]
// <NSCountedSet: 0x885df70> (a [2], only [1], test [2], This [2], is [2])
If I had to guess, I would say NSRegularExpression for that. Like this:
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
options:0
range:NSMakeRange(0, [string length])];
That snippet was taken from here.
Edit 1.0:
Based on what Sir Till said:
NSString *string = #"This is a test, so it is a test";
NSMutableDictionary *dictionary = [NSMutableDictionary dictionary];
NSArray *arrayOfWords = [string componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
for (NSString *word in arrayOfWords)
{
if ([dictionary objectForKey:word])
{
NSNumber *numberOfOccurences = [dictionary objectForKey:word];
NSNumber *increment = [NSNumber numberWithInt:(1 + [numberOfOccurences intValue])];
[dictionary setValue:increment forKey:word];
}
else
{
[dictionary setValue:[NSNumber numberWithInt:1] forKey:word];
}
}
You should be careful with:
Punctuation signs. (near other words)
UpperCase words vs lowerCase words.
I think that's really bad idea that you trying to search a words among the long paragraph with a loop. You should use a regular expression to do that! I know it's not easy at first time to learn it but it's really worth to know it! Take look at this case Use regular expression to find/replace substring in NSString

Count of chars in NSString or NSMutableString?

I've tried this
NSCharacterSet *myCharSet = [NSCharacterSet characterSetWithCharactersInString: myString];
[myCharSet count];
But get a warning that NSCharacterSet may not respond to count. This is for desktop apps and not iPhone, which I think the above code works with.
I might be missing something here, but what's wrong with simply doing:
NSUInteger characterCount = [myString length];
To just get the number of characters in a string, I don't see any reason to mess around with NSCharacterSet.
That should not work on the iPhone either, as NSCharacterSet is not a subclass of NSSet on either platform.
If you really need to get a count why not subclass NSSet, add the value, then have a method that returns that as an NSCharacterSet on demand for use in anything that needs a character set?
NSString *string = #"0̄ 😄";
__block NSUInteger count = 0;
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
count++;
}];
NSLog(#"%ld %ld", (long)count, (long)[string length]);