Extracting sentences containing keywords objective c - objective-c

I have a block of text (a newspaper article if it's of any relevance) was wondering if there is a way to extract all sentences containing a particular keyword in objective-c? I've been looking a bit at ParseKit but aren't having much luck!

You can enumerate sentences using native NSString methods like this...
NSString *string = #"your text";
NSMutableArray *sentences = [NSMutableArray array];
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationBySentences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
//check that this sentence has the string you are looking for
NSRange range = [substring rangeOfString:#"The text you are looking for"];
if (range.location != NSNotFound) {
[sentences addObject:substring];
}
}];
for (NSString *sentence in sentences) {
NSLog(#"%#", sentence);
}
At the end you will have an array of sentences all containing the text you were looking for.

Edit: As noted in the comments there are some inherit weaknesses with my solution as it requires a perfectly formatted sentence where period + space is only used when actually ending sentences... I'll leave it in here as it could be viable for people sorting a text with another (known) separator.
Here's another way of achieving what you want:
NSString *wordYouAreLookingFor = #"happy";
NSArray *arrayOfSentences = [aString componentsSeparatedByString:#". "]; // get the single sentences
NSMutableArray *sentencesWithMatchingWord = [[NSMutableArray alloc] init];
for (NSString *singleSentence in arrayOfSentences) {
NSInteger originalSize = [singleSentence length];
NSString *possibleNewString = [singleSentence stringByReplacingOccurrencesOfString:wordYouAreLookingFor withString:#""];
if (originalSize != [possibleNewString length]) {
[sentencesWithMatchingWord addObject:singleSentence];
}
}

Related

Take all numbers separated by spaces from a string and place in an array

I have a NSString formatted like this:
"Hello world 12 looking for some 56"
I want to find all instances of numbers separated by whitespace and place them in an NSArray. I dont want to remove the numbers though.
Whats the best way of achieving this?
This is a solution using regular expression as suggested in the comment.
NSString *string = #"Hello world 12 looking for some 56";
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:#"\\b\\d+" options:nil error:nil];
NSArray *matches = [expression matchesInString:string options:nil range:(NSMakeRange(0, string.length))];
NSMutableArray *result = [[NSMutableArray alloc] init];
for (NSTextCheckingResult *match in matches) {
[result addObject:[string substringWithRange:match.range]];
}
NSLog(#"%#", result);
First make an array using NSString's componentsSeparatedByString method and take reference to this SO question. Then iterate the array and refer to this SO question to check if an array element is number: Checking if NSString is Integer.
I don't know where you are looking to do perform this action because it may not be fast (such as if it's being called in a table cell it may be choppy) based upon the string size.
Code:
+ (NSArray *)getNumbersFromString:(NSString *)str {
NSMutableArray *retVal = [NSMutableArray array];
NSCharacterSet *numericSet = [NSCharacterSet decimalDigitCharacterSet];
NSString *placeholder = #"";
unichar currentChar;
for (int i = [str length] - 1; i >= 0; i--) {
currentChar = [str characterAtIndex:i];
if ([numericSet characterIsMember:currentChar]) {
placeholder = [placeholder stringByAppendingString:
[NSString stringWithCharacters:&currentChar
length:[placeholder length]+1];
} else {
if ([placeholder length] > 0) [retVal addObject:[placeholder intValue]];
else placeholder = #"";
return [retVal copy];
}
To explain what is happening above, essentially I am,
going through every character until I find a number
adding that number including any numbers after to a string
once it finds a number it adds it to an array
Hope this helps please ask for clarification if needed

matching multiple words with enumerateSubstringsInRange in NSMutableAttributedString

I am trying to match the string below but unfortunately it only gives me "nope" as the result. Can anyone help? thanks in advance!
NSMutableAttributedString *text = [NSMutableString stringWithString:#"darn thing suddenly erupted without any warning.";
NSString *findMe = #"suddenly erupted";
[text enumerateSubstringsInRange:NSMakeRange(0, [text length]) options:NSStringEnumerationByWords usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
if ([findMe isEqualToString:substring] ) {
NSLog(#"found it");
}
else {
NSLog(#"nope");
}
}];
Your method is only enumerating separate words. "suddenly erupted" are two words.
Why don't you use -rangeOfSubstring: in order to find whether text contains some substring? For example:
NSLog(#"%#",[[text mutableString] rangeOfString:findMe].location == NSNotFound ? #"nope" : #"found it");
enumerateSubstringsInRange have options like
NSStringEnumerationByLines
NSStringEnumerationBySentences
NSStringEnumerationByParagraphs
NSStringEnumerationByComposedCharacterSequences
NSStringEnumerationByWords
if you have words to compare means it will work
e.g
NSString *text = #"darn thing suddenlyerupted without any warning.";
NSString *findMe = #"suddenlyerupted";
so you cant compare sub string. You need to customize the block or move to some other option.

Spliting string to array by constant number

I'v been trying to split string to array of components by number, but have no idea how to do it. I know that each components lenght is 9 except the last one. But there is no separation between them. Maybe anyone would know how could i make this split possible?
string : E44000000R33000444V33441
And i'd like to get array with: E44000000 R33000444 V33441
in past I'v used this method, but i guess there should be a way to separate by constant number. Any ideas
NSArray *myWords = [message componentsSeparatedByString:#";"];
Please try the below code.
NSString *stringTest = #"E44000000R33000444V33441323";
NSMutableArray *arrayTest = [NSMutableArray array];
while([stringTest length] > 8) {
[arrayTest addObject:[NSString stringWithString:[stringTest substringToIndex:9]]];
stringTest = [stringTest substringFromIndex:9];
}
NSLog(#"arrayTest - %#", arrayTest);
Try this one..
NSString *mainString=#"E44000000R33000444V";
NSMutableArray *brokenString=[NSMutableArray new];
int start=0;
for (; start<mainString.length-9; start+=9) {
[brokenString addObject:[mainString substringWithRange:NSMakeRange(start, 9)]];
}
[brokenString addObject:[mainString substringFromIndex:start]];
NSLog(#"->%#",brokenString);
Output is :
->(
E44000000,
R33000444,
V
)
I investigated the NSString, and i didn't found any function like that. But you can create a category of NSString and put this function in that category and you can use as a NSString instance method.
- (NSArray *) componentSaparetedByLength:(NSUInteger) length{
NSMutableArray *array = [NSMutableArray new];
NSRange range = NSMakeRange(0, length);
NSString *subString = nil;
while (range.location + range.length <= self.length) {
subString = [self substringWithRange:range];
[array addObject:subString];
//Edit
range.location = range.length + range.location;
//Edit
range.length = length;
}
if(range.location<self.length){
subString = [self substringFromIndex:range.location];
[array addObject:subString];
}
return array;
}
You can get the substring upto the characters which you want in a loop(string length) & pass the next index for getting the next substring. After getting each substring you can add it to the array.
Used SubstringToIndex & SubstringFromIndex functions to get the substring.
Also not an requirement here, I want to propose a solution that is capable of handling characters from more sophisticated script systems, like surrogate pairs, base characters plus combining marks, Hangul jamo, and Indic consonant clusters.
#interface NSString (Split)
-(NSArray *)arrayBySplittingWithMaximumSize:(NSUInteger)size
options:(NSStringEnumerationOptions) option;
#end
#implementation NSString (Split)
-(NSArray *)arrayBySplittingWithMaximumSize:(NSUInteger)size
options:(NSStringEnumerationOptions) option
{
NSMutableArray *letterArray = [NSMutableArray array];
[self enumerateSubstringsInRange:NSMakeRange(0, [self length])
options:(option)
usingBlock:^(NSString *substring,
NSRange substringRange,
NSRange enclosingRange,
BOOL *stop) {
[letterArray addObject:substring];
}];
NSMutableArray *array = [NSMutableArray array];
[letterArray enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
if (idx%size == 0) {
[array addObject: [NSMutableString stringWithCapacity:size]];
}
NSMutableString *string = [array objectAtIndex:[array count]-1];
[string appendString:obj];
}];
return array;
}
#end
usage
NSArray *array = [#"E44000000R33000444V33441" arraysBySplittingWithMaximumSize:9
options:NSStringEnumerationByComposedCharacterSequences];
results in:
(
E44000000,
R33000444,
V33441
)

iOS - Most efficient way to find word occurrence count in a string

Given a string, I need to obtain a count of each word that appears in that string. To do so, I extracted the string into an array, by word, and searched that way, but I have the feeling that searching the string directly is more optimal. Below is the code that I originally wrote to solve the problem. I'm up for suggestions on better solutions though.
NSMutableDictionary *sets = [[NSMutableDictionary alloc] init];
NSString *paragraph = [[NSString alloc] initWithContentsOfFile:[[NSBundle mainBundle] pathForResource:#"text" ofType:#"txt"] encoding:NSUTF8StringEncoding error:NULL];
NSMutableArray *words = [[[paragraph lowercaseString] componentsSeparatedByString:#" "] mutableCopy];
while (words.count) {
NSMutableIndexSet *indexSet = [[NSMutableIndexSet alloc] init];
NSString *search = [words objectAtIndex:0];
for (unsigned i = 0; i < words.count; i++) {
if ([[words objectAtIndex:i] isEqualToString:search]) {
[indexSet addIndex:i];
}
}
[sets setObject:[NSNumber numberWithInt:indexSet.count] forKey:search];
[words removeObjectsAtIndexes:indexSet];
}
NSLog(#"%#", sets);
Example:
Starting string:
"This is a test. This is only a test."
Results:
"This" - 2
"is" - 2
"a" - 2
"test" - 2
"only" - 1
This is exactly what an NSCountedSet is for.
You need to break the string apart into words (which iOS is nice enough to give us a function for so that we don't have to worry about punctuation) and just add each of them to the counted set, which keeps track of the number of times each object appears in the set:
NSString *string = #"This is a test. This is only a test.";
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
// This block is called once for each word in the string.
[countedSet addObject:substring];
// If you want to ignore case, so that "this" and "This"
// are counted the same, use this line instead to convert
// each word to lowercase first:
// [countedSet addObject:[substring lowercaseString]];
}];
NSLog(#"%#", countedSet);
// Results: 2012-11-13 14:01:10.567 Testing App[35767:fb03]
// <NSCountedSet: 0x885df70> (a [2], only [1], test [2], This [2], is [2])
If I had to guess, I would say NSRegularExpression for that. Like this:
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
options:0
range:NSMakeRange(0, [string length])];
That snippet was taken from here.
Edit 1.0:
Based on what Sir Till said:
NSString *string = #"This is a test, so it is a test";
NSMutableDictionary *dictionary = [NSMutableDictionary dictionary];
NSArray *arrayOfWords = [string componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
for (NSString *word in arrayOfWords)
{
if ([dictionary objectForKey:word])
{
NSNumber *numberOfOccurences = [dictionary objectForKey:word];
NSNumber *increment = [NSNumber numberWithInt:(1 + [numberOfOccurences intValue])];
[dictionary setValue:increment forKey:word];
}
else
{
[dictionary setValue:[NSNumber numberWithInt:1] forKey:word];
}
}
You should be careful with:
Punctuation signs. (near other words)
UpperCase words vs lowerCase words.
I think that's really bad idea that you trying to search a words among the long paragraph with a loop. You should use a regular expression to do that! I know it's not easy at first time to learn it but it's really worth to know it! Take look at this case Use regular expression to find/replace substring in NSString

Objective-C Find the most commonly used words in an NSString

I am trying to write a method:
- (NSDictionary *)wordFrequencyFromString:(NSString *)string {}
where the dictionary returned will have the words and how often they were used in the string provided. Unfortunately, I can't seem to find a way to iterate through words in a string to analyze each one - only each character which seems like a bit more work than necessary. Any suggestions?
NSString has -enumerateSubstringsInRange: method which allows to enumerate all words directly, letting standard api to do all necessary stuff to define word boundaries etc:
[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
options:NSStringEnumerationByWords
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"%#", substring);
}];
In the enumeration block you can use either NSDictionary with words as keys and NSNumber as their counts, or use NSCountedSet that provides required functionality for counts.
You can use componentsSeparatedByCharactersInSet: to split the string and NSCountedSet will count the words for you.
1) Split the string into words using a combination of the punctuation, whitespace and new line character sets:
NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];
[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSArray *words = [myString componentsSeparatedByCharactersInSet:separators];
2) Count the occurrences of the words (if you want to disregard capitalization, you can do NSString *myString = [originalString lowercaseString]; before splitting the string into components):
NSCountedSet *frequencies = [NSCountedSet setWithArray:words];
NSUInteger aWordCount = [frequencies countForObject:#"word"]);
If you are willing to change your method signature, you can just return the counted set.
Split the string into an array of words using -[NSString componentsSeparatedByCharactersInSet:] first. (Use [[NSCharacterSet letterCharacterSet] invertedSet] as the argument to split on all non-letter characters.)
I used following approach for getting most common word from NSString.
-(void)countMostFrequentWordInSpeech:(NSString*)speechString
{
NSString *string = speechString;
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
[countedSet addObject:substring];
}];
// NSLog(#"%#", countedSet);
//Sort CountedSet & get most frequent common word at 0th index of resultant array
NSMutableArray *dictArray = [NSMutableArray array];
[countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
[dictArray addObject:#{#"object": obj,
#"count": #([countedSet countForObject:obj])}];
}];
NSArray *sortedArrayOfWord= [dictArray sortedArrayUsingDescriptors:#[[NSSortDescriptor sortDescriptorWithKey:#"count" ascending:NO]]];
if (sortedArrayOfWord.count>0)
{
self.mostFrequentWordLabel.text=[NSString stringWithFormat:#"Frequent Word: %#", [[sortedArrayOfWord[0] valueForKey:#"object"] capitalizedString]];
}
}
"speechString" is my string from which I have to get most frequent/common words. Object at 0th index of array "sortedArrayOfWord" would be most common word.