Finding 2 Capitalized Words in a Row NSString - objective-c

I'm writing a Mac app that goes through an NSString, and adds all its word to an NSArray (by separating them based on whitespace). Now, I've got the whole system down, but I'm still having one little problem: names (first + last), are added as two different words, and that's bothersome to me.
I thought of a couple solutions to fix this. My best idea was to, before actually adding the words to the array, join two words in a row that are capitalized. Then, through an if statement, determine if a word has two capitals in it, and then split the word and add it as one word. However, I can't find a way to find 2 words in a row with capitals.
Should I be using RegexKitLite (which I'm not familiar with), for example, to find two capitalized words in a row? I've seen this question: Regexp to pull capitalized words not at the beginning of sentence and two adjacent words
which seems somehow related, but due to my lack of understand of regular expressions, I don't really know if this is exactly what I need.
I've also seen this: Separating NSString into NSArray, but allowing quotes to group words
which is also similar, yet not exactly adapted to my needs.
So, to conclude, does anyone know how to either join capitalized words in an NSString, or even better, how to find two capitalized words in a row in an NSString ?

If you're targeting iOS 4.0 or greater OR OS 10.7 you can use NSRegularExpression
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"[A-Z]\\w*\\s[A-Z]\\w*"
options:nil
error:&error];
NSString *inputString = #"One two Three Four five six Seven Eight";
NSArray *stringsWithTwoCapitalizedWordsInARow = [regex
matchesInString:inputString
options:0
range:NSMakeRange(0, [string length])];
You'll get something like this
["Three Four", "Seven Eigth"]

You could just do a second pass on the resulting array after it has been loaded to append entries together that need to be joined.
Names are notoriously difficult to match with regular expressions alone, as it is not unheard of for names (first or last) to contain spaces themselves.
NSMutableArray* words = ...;
NSMutableArray* joinedWords = [NSMutableArray array];
for (int i=0; i < [words length]; i++)
{
NSString* currentLine = [words objectAtIndex:i];
bool capitalized = false;
bool capitalizedNext = false;
capitalized = isCap(currentLine); // Up to your discretion here
NSString* nextLine = nil;
// for the last entry
if (i+1 < [words length])
{
nextLine = [words objectAtIndex:i+1];
capitalizedNext = isCap(nextLine);
}
// Check if first letter is uppercase
if (capitalized == true && capitalizedNext == true)
{
[words replaceObjectAtIndex:i withObject:[NSString stringWithFormat:#"%# %#", currentLine, nextLine];
[words removeObjectAtIndex:i+1];
// Run test again on new version of the line
i--;
}
else
{
[joinedWords addObject:currentLine];
}
}

[A-Z][A-Za-z]* [A-Z][A-Za-z]*|[\S]*
http://rubular.com/r/DrOabOAfBr
I've written a regular expression for you. This regex will try to match a name first, then fall back to a word, so your job is as simple as feeding this into NSRegularExpression, and take all the matches as your words, or names joined.

Related

NSPredicate Detect First & Last Name

I am trying to use NSPredicate to evaluate whether or not a NSString has both a first and last name (Essentially a space between two non-digit words). This code hasn't been working for me (Code taken & modified slightly from: What are best practices for validating email addresses in Objective-C for iOS 2.0?:
-(BOOL) validName:(NSString*) nameString {
NSString *regExPattern = #"[A-Z]+_[A-Z]";
NSRegularExpression *regEx = [[NSRegularExpression alloc] initWithPattern:regExPattern options:NSRegularExpressionCaseInsensitive error:nil];
NSUInteger regExMatches = [regEx numberOfMatchesInString:nameString options:0 range:NSMakeRange(0, [nameString length])];
if (regExMatches == 0) {
return NO;
} else
return YES;
}
}
I think there is something wrong with my regEx pattern, but I'm not sure how to fix it. This is how I check the string:
if([self validName:nameTextField.text]) {
// Valid Name
} else {
// Name no valid
}
First, if you want to match a space, then just put a space in the regex pattern. The underscore you have now will require an underscore in your name field in order to match.
Second, NSPredicate matches the whole string against the regex, so the pattern would not catch normal last names (which have more than one character), even with the space. You'll need to add some expression that covers the last part of the name.
Third, since you pass the text field directly into the check, you are putting some pressure on your users to type everything like you expected. You might want to clean the string a bit first, before testing. Personally, I would at least trim the string for spaces and replace multiple spaces with a single one.
Here is some code that does this:
NSString *regExPattern = #"[A-Z]+ [A-Z]+"; //Added a "+" to match the whole string up to the end.
Check:
NSString *name = nameTextField.text;
name = [name stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
name = [name stringByReplacingOccurrencesOfString:#" +"
withString:#" "
options:NSRegularExpressionSearch
range:NSMakeRange(0, name.length)];
if([self validName: name]) {
// Valid Name
} else {
// Name no valid
}
As you can imagine there are many ways to do this, but this is a start. You should consider your test for "correct" names, though, as there are many names that won't pass you simple regex, for instance names with apostrophes and accents, for instance:
Jim O'Malley
Zoƫ Jones
etc.
If you just want to check for the space-separated fore- and surname, I would try this:
- (BOOL)validName:(NSString*)name
{
NSArray *components = [name componentsSeparatedByString:#" "];
return ([components count] >= 1);
}
This will check if you've at least two components separated by a space. This will also work for names with 3 or more components (middle names).

Search exact word in NSString

I need to find a word or several words. With this method, however, I find also piece of word.
NSString *searchString = [NSString stringWithFormat:#"%#",searchField.text];
NSRange range = [textString rangeOfString : searchString];
if (range.location != NSNotFound) {
NSLog(#"textString = %#", textString);
}
I need the word / words exact
How can I do?
Thank you!
There are various ways of parsing/finding sub-strings in NSString:
NSString itself
NSRegularExpression. This would probably better suit your needs since you can tackle the scenario of surrounding white-spaces around words. Thus is won't return the cat from catapult when searching for cat.
NSScanner (most likely overkill for you needs)
... and they, of course, each have their PROs and CONs.
NSString has 9 methods grouped under "Finding Characters and Substrings". Methods such as:
-rangeOfString:
Finds and returns the range of the first occurrence of a given string within the receiver.
NSRegularExpression has 5 methods grouped under "Searching Strings Using Regular Expressions". Methods such as:
-numberOfMatchesInString: options: range:
Returns the number of matches of the regular expression within the specified range of the string.
It might also be useful to know about NSScanner, but this class would be more useful if you're parsing the string than simply looking for sub-parts.
What happens if you add a space at the end of the search string, like so:
NSString *searchString = [NSString stringWithFormat:#"%# ",searchField.text];
If the string from searchField.text already ends with a space, you would have to remove it.
This is not a perfect solution yet, for example you would not find the search string if it is at the end of a sentence. Instead what you could do is not adding the whitespace character, but instead look at the character after the hit and make sure that it is not a letter. For this, take a look at the class NSCharacterSet:
NSCharacterSet * letters = [NSCharacterSet letterCharacterSet];
if (![letters characterIsMember:[textString characterAtIndex:(range.location+searchString.length)]]) {
...
}

capitalizedString doesn't capitalize correctly words starting with numbers?

I'm using the NSString method [myString capitalizedString], to capitalize all words of my string.
However capitalization doesn't work very well for words starting with numbers.
i.e. 2nd chance
becomes
2Nd Chance
Even if n is not the first letter of the word.
thanks
You have to roll your own solution to this problem. The Apple docs state that you may not get the specified behavior using that function for multi-word strings and for strings with special characters. Here's a pretty crude solution
NSString *text = #"2nd place is nothing";
// break the string into words by separating on spaces.
NSArray *words = [text componentsSeparatedByString:#" "];
// create a new array to hold the capitalized versions.
NSMutableArray *newWords = [[NSMutableArray alloc]init];
// we want to ignore words starting with numbers.
// This class helps us to determine if a string is a number.
NSNumberFormatter *num = [[NSNumberFormatter alloc]init];
for (NSString *item in words) {
NSString *word = item;
// if the first letter of the word is not a number (numberFromString returns nil)
if ([num numberFromString:[item substringWithRange:NSMakeRange(0, 1)]] == nil) {
word = [item capitalizedString]; // capitalize that word.
}
// if it is a number, don't change the word (this is implied).
[newWords addObject:word]; // add the word to the new list.
}
NSLog(#"%#", [newWords description]);
Unfortunately this seems to be the general behaviour of capitalizedString.
Perhaps a not so nice workaround / hack would be to replace each number with a string before the transformation, and then change it back afterwards.
So, "2nd chance" -> "xyznd chance" -> "Xyznd Chance" -> "2nd Chance"

Split NSString into words, then rejoin it into original form

I am splitting an NSString like this: (filter string is an nsstring)
seperatorSet = [NSMutableCharacterSet whitespaceAndNewlineCharacterSet];
[seperatorSet formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];
NSMutableArray *words = [[filterString componentsSeparatedByCharactersInSet:seperatorSet] mutableCopy];
I want to put words back into the form of filter string with the original punctuation and spacing. The reason I want to do this is I want to change some words and put it back together as it was originally.
A more robust way to split by words is to use string enumeration. A space is not always the delimiter and not all languages delimit spaces anyway (e.g. Japanese).
NSString * string = #" \n word1! word2,%$?'/word3.word4 ";
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationByWords
usingBlock:
^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"Substring: '%#'", substring);
}];
// Logs:
// Substring: 'word1'
// Substring: 'word2'
// Substring: 'word3'
// Substring: 'word4'
NSString *myString = #"Foo Bar Blah B..";
NSArray *myWords = [myString componentsSeparatedByCharactersInSet:
[NSCharacterSet characterSetWithCharactersInString:#" "]
];
NSString* string = [myWords componentsJoinedByString: #" "];
NSLog(#"%#",string);
Since you eliminate the original punctuation, there's no way to turn it back automatically.
The only way is not to use componentsSeparatedByCharactersInSet.
An alternative solution may be to iterate through the string and, for each char, check if it belongs to your character set.
If yes, add the char to a list and the substring to another list (you may use NSMutableArray class).
This way, for example, you know that the punctuation char between the first and the second substring is the first character in your list of separators.
You can use the pathArray componentsJoinedByString: method of the array class to rejoin the words:
NSString *orig = [words pathArray componentsJoinedByString:#" "];
How are you determining which words need to be replaced? Instead of breaking it apart in the first place, perhaps using -stringByReplacingOccurrencesOfString:withString:options:range: would be more suitable.
My guess is you may not be using the best API. If you're really worried about words, you should be using a word-based API. I'm a bit hazy on whether that would be NSDataDetector or something else. (I believe NSRegularExpression can deal with word boundaries in a smarter way.)
If you are using Mac OS X 10.7+ or iOS 4+ you can use NSRegularExpression, The pattern to replace a word is: "\b word \b" - (no spaces around word) \b matches a word boundary. Look at methods replaceMatchesInString:options:range:withTemplate: and stringByReplacingMatchesInString:options:range:withTemplate:.
Under 10.6 pr earlier if you wish to use regular expressions you can wrap the regcomp/regexec C-based functions, they support word boundaries as well. However you may prefer to use one of the other Cocoa options mentioned in other answers for this simple case.

Is there a way to get Spell Check data from an NSString?

I'm writing a simple shift cipher iPhone app as a pet project, and one piece of functionality I'm currently designing is a "universal" decryption of an NSString, that returns an NSArray, all of NSStrings:
- (NSArray*) decryptString: (NSString*)ciphertext{
NSMutableArray* theDecryptions = [NSMutableArray arrayWithCapacity:ALPHABET];
for (int i = 0; i < ALPHABET; ++i) {
NSString* theNewPlainText = [self decryptString:ciphertext ForShift:i];
[theDecryptions insertObject:theNewPlainText
atIndex:i];
}
return theDecryptions;
}
I'd really like to pass this NSArray into another method that attempts to spell check each individual string within the array, and builds a new array that puts the strings with the fewest typo'd words at lower indicies, so they're displayed first. I'd like to use the system's dictionary like a text field would, so I can match against words that have been trained into the phone by its user.
My current guess is to split a given string up into words, then spell check each with NSSpellChecker's -checkSpellingOfString:StartingAt: and using the number of correct words to sort the Array. Is there an existing library method or well-accepted pattern that would help return such a value for a given string?
Well, I found a solution that works using UIKit/UITextChecker. It correctly finds the user's most preferred language dictionary, but I'm not sure if it includes learned words in the actual rangeOfMisspelledWords... method. If it doesn't, calling [UITextChecker hasLearnedWord] on currentWord inside the bottom if statement should be enough to find user-taught words.
As noted in the comments, it may be prudent to call rangeOfMisspelledWords with each of the top few languages in [UITextChecker availableLanguages], to help multilingual users.
-(void) checkForDefinedWords {
NSArray* words = [message componentsSeparatedByString:#" "];
NSInteger wordsFound = 0;
UITextChecker* checker = [[UITextChecker alloc] init];
//get the first language in the checker's memory- this is the user's
//preferred language.
//TODO: May want to search with every language (or top few) in the array
NSString* preferredLang = [[UITextChecker availableLanguages] objectAtIndex:0];
//for each word in the array, determine whether it is a valid word
for(NSString* currentWord in words){
NSRange range;
range = [checker rangeOfMisspelledWordInString:currentWord
range:NSMakeRange(0, [currentWord length])
startingAt:0
wrap:NO
language:preferredLang];
//if it is valid (no errors found), increment wordsFound
if (range.location == NSNotFound) {
//NSLog(#"%# %#", #"Valid Word found:", currentWord);
wordsFound++;
}
else {
//NSLog(#"%# %#", #"Invalid Word found:", currentWord);
}
}
//After all "words" have been searched, save wordsFound to validWordCount
[self setValidWordCount:wordsFound];
[checker release];
}