Separate Full Sentences in a block of NSString text - objective-c

I have been trying to use Regular Expression to separate full sentences in a big block of text. I can't use the componentsSeparatedByCharactersInSet because it will obviously fail with sentences ending in ?!, !!, ... I have seen some external classes to do componentSeparateByRegEx but I prefer doing it without adding an external library.
Here is a sample input
Hi, I am testing. How are you? Wow!! this is the best, and I am happy.
The output should be an array
first element: Hi, I am testing.
second element: How are you?
third element: wow!!
forth element: this is the best, and I am happy.
This is what I have but as I mentioned it shouldn't do what I intend. Probably a regular expression will do a much better job here.
-(NSArray *)getArrayOfFullSentencesFromBlockOfText:(NSString *)textBlock{
NSMutableCharacterSet *characterSet = [[NSMutableCharacterSet alloc] init];
[characterSet addCharactersInString:#".?!"];
NSArray * sentenceArray = [textBlock componentsSeparatedByCharactersInSet:characterSet];
return sentenceArray;
}
Thanks for your help,

You want to use -[NSString enumerateSubstringsInRange:options:usingBlock:] with the NSStringEnumerationBySentences option. This will give you every sentence, and it does so in a language-aware manner.
NSArray *fullSentencesFromText(NSString *text) {
NSMutableArray *results = [NSMutableArray array];
[text enumerateSubstringsInRange:NSMakeRange(0, [text length]) options:NSStringEnumerationBySentences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
[results addObject:substring];
}];
return results;
}
Note, in testing, each substring appears to contain the trailing spaces after the punctuation. You may want to strip those out.

Something like this could do the job:
NSString *msg = #"Hi, I am testing. How are you? Wow!! this is the best, and I am happy.";
[msg enumerateSubstringsInRange:NSMakeRange(0, [msg length])
options:NSStringEnumerationBySentences | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop)
{
NSLog(#"Sentence:%#", substring);
// Add each sentence into an array
}];

Or use:
[mutstri enumerateSubstringsInRange:NSMakeRange(0, [mutstri length])
options:NSStringEnumerationBySentences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
NSLog(#"%#", substring);
}];

Related

Get words after a certain sign of NSString

I need to get the word that comes after a certain sign, and remove it.
example :
NSString *me=#" i am going to make !somthing great" ;
I need to remove the word something, together with the ! sign, where ever it will occur in that text.
Is there some method like stringByReplacingOccurrencesOfString: to not only find the sign ,but identify the word that attached to it ?
Thanks.
You want a regular expression. In this case, you want one with the pattern #"!\w*". (An NSScanner would also work, but I think a regular expression is more concise in this case.)
If you have reasons not to use regular expressions (or if you are not familiar with them) you can use following
NSString *me=#" i am going to make !somthing great" ;
NSRange r1 = [me rangeOfString:#"!"];
if (r1.location != NSNotFound) {
NSRange r2 = [me rangeOfCharacterFromSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]
options:0
range:NSMakeRange(r1.location, me.length - r1.location)];
if (r2.location != NSNotFound) {
me = [me stringByReplacingCharactersInRange:NSMakeRange(r1.location, r2.location - r1.location) withString:#""];
}
}
Here's code:
NSMutableString *mutableMe = [me mutableCopy];
NSError *error;
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:#"!\\w*" options:0 error:&error];
[regex replaceMatchesInString:mutableMe options:0 range:NSMakeRange(0, [mutableMe length]) withTemplate:#""];
If you want to find it first than use
[regex enumerateMatchesInString:mutableMe options:0 range:NSMakeRange(0, [mutableMe length]) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange rangeOfString = [result rangeAtIndex:0];
[mutableMe replaceCharactersInRange:rangeOfString withString:#""];
}];
Try following syntax this can help you
NSString *replacedString=[NSString stringByReplacingOccurancesOfString:#"!something" withString:#" "];

matching multiple words with enumerateSubstringsInRange in NSMutableAttributedString

I am trying to match the string below but unfortunately it only gives me "nope" as the result. Can anyone help? thanks in advance!
NSMutableAttributedString *text = [NSMutableString stringWithString:#"darn thing suddenly erupted without any warning.";
NSString *findMe = #"suddenly erupted";
[text enumerateSubstringsInRange:NSMakeRange(0, [text length]) options:NSStringEnumerationByWords usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
if ([findMe isEqualToString:substring] ) {
NSLog(#"found it");
}
else {
NSLog(#"nope");
}
}];
Your method is only enumerating separate words. "suddenly erupted" are two words.
Why don't you use -rangeOfSubstring: in order to find whether text contains some substring? For example:
NSLog(#"%#",[[text mutableString] rangeOfString:findMe].location == NSNotFound ? #"nope" : #"found it");
enumerateSubstringsInRange have options like
NSStringEnumerationByLines
NSStringEnumerationBySentences
NSStringEnumerationByParagraphs
NSStringEnumerationByComposedCharacterSequences
NSStringEnumerationByWords
if you have words to compare means it will work
e.g
NSString *text = #"darn thing suddenlyerupted without any warning.";
NSString *findMe = #"suddenlyerupted";
so you cant compare sub string. You need to customize the block or move to some other option.

Extracting sentences containing keywords objective c

I have a block of text (a newspaper article if it's of any relevance) was wondering if there is a way to extract all sentences containing a particular keyword in objective-c? I've been looking a bit at ParseKit but aren't having much luck!
You can enumerate sentences using native NSString methods like this...
NSString *string = #"your text";
NSMutableArray *sentences = [NSMutableArray array];
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationBySentences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
//check that this sentence has the string you are looking for
NSRange range = [substring rangeOfString:#"The text you are looking for"];
if (range.location != NSNotFound) {
[sentences addObject:substring];
}
}];
for (NSString *sentence in sentences) {
NSLog(#"%#", sentence);
}
At the end you will have an array of sentences all containing the text you were looking for.
Edit: As noted in the comments there are some inherit weaknesses with my solution as it requires a perfectly formatted sentence where period + space is only used when actually ending sentences... I'll leave it in here as it could be viable for people sorting a text with another (known) separator.
Here's another way of achieving what you want:
NSString *wordYouAreLookingFor = #"happy";
NSArray *arrayOfSentences = [aString componentsSeparatedByString:#". "]; // get the single sentences
NSMutableArray *sentencesWithMatchingWord = [[NSMutableArray alloc] init];
for (NSString *singleSentence in arrayOfSentences) {
NSInteger originalSize = [singleSentence length];
NSString *possibleNewString = [singleSentence stringByReplacingOccurrencesOfString:wordYouAreLookingFor withString:#""];
if (originalSize != [possibleNewString length]) {
[sentencesWithMatchingWord addObject:singleSentence];
}
}

Objective-C Find the most commonly used words in an NSString

I am trying to write a method:
- (NSDictionary *)wordFrequencyFromString:(NSString *)string {}
where the dictionary returned will have the words and how often they were used in the string provided. Unfortunately, I can't seem to find a way to iterate through words in a string to analyze each one - only each character which seems like a bit more work than necessary. Any suggestions?
NSString has -enumerateSubstringsInRange: method which allows to enumerate all words directly, letting standard api to do all necessary stuff to define word boundaries etc:
[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
options:NSStringEnumerationByWords
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"%#", substring);
}];
In the enumeration block you can use either NSDictionary with words as keys and NSNumber as their counts, or use NSCountedSet that provides required functionality for counts.
You can use componentsSeparatedByCharactersInSet: to split the string and NSCountedSet will count the words for you.
1) Split the string into words using a combination of the punctuation, whitespace and new line character sets:
NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];
[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSArray *words = [myString componentsSeparatedByCharactersInSet:separators];
2) Count the occurrences of the words (if you want to disregard capitalization, you can do NSString *myString = [originalString lowercaseString]; before splitting the string into components):
NSCountedSet *frequencies = [NSCountedSet setWithArray:words];
NSUInteger aWordCount = [frequencies countForObject:#"word"]);
If you are willing to change your method signature, you can just return the counted set.
Split the string into an array of words using -[NSString componentsSeparatedByCharactersInSet:] first. (Use [[NSCharacterSet letterCharacterSet] invertedSet] as the argument to split on all non-letter characters.)
I used following approach for getting most common word from NSString.
-(void)countMostFrequentWordInSpeech:(NSString*)speechString
{
NSString *string = speechString;
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
[countedSet addObject:substring];
}];
// NSLog(#"%#", countedSet);
//Sort CountedSet & get most frequent common word at 0th index of resultant array
NSMutableArray *dictArray = [NSMutableArray array];
[countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
[dictArray addObject:#{#"object": obj,
#"count": #([countedSet countForObject:obj])}];
}];
NSArray *sortedArrayOfWord= [dictArray sortedArrayUsingDescriptors:#[[NSSortDescriptor sortDescriptorWithKey:#"count" ascending:NO]]];
if (sortedArrayOfWord.count>0)
{
self.mostFrequentWordLabel.text=[NSString stringWithFormat:#"Frequent Word: %#", [[sortedArrayOfWord[0] valueForKey:#"object"] capitalizedString]];
}
}
"speechString" is my string from which I have to get most frequent/common words. Object at 0th index of array "sortedArrayOfWord" would be most common word.

Count of chars in NSString or NSMutableString?

I've tried this
NSCharacterSet *myCharSet = [NSCharacterSet characterSetWithCharactersInString: myString];
[myCharSet count];
But get a warning that NSCharacterSet may not respond to count. This is for desktop apps and not iPhone, which I think the above code works with.
I might be missing something here, but what's wrong with simply doing:
NSUInteger characterCount = [myString length];
To just get the number of characters in a string, I don't see any reason to mess around with NSCharacterSet.
That should not work on the iPhone either, as NSCharacterSet is not a subclass of NSSet on either platform.
If you really need to get a count why not subclass NSSet, add the value, then have a method that returns that as an NSCharacterSet on demand for use in anything that needs a character set?
NSString *string = #"0̄ 😄";
__block NSUInteger count = 0;
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
count++;
}];
NSLog(#"%ld %ld", (long)count, (long)[string length]);