Is there a way to get Spell Check data from an NSString? - objective-c

I'm writing a simple shift cipher iPhone app as a pet project, and one piece of functionality I'm currently designing is a "universal" decryption of an NSString, that returns an NSArray, all of NSStrings:
- (NSArray*) decryptString: (NSString*)ciphertext{
NSMutableArray* theDecryptions = [NSMutableArray arrayWithCapacity:ALPHABET];
for (int i = 0; i < ALPHABET; ++i) {
NSString* theNewPlainText = [self decryptString:ciphertext ForShift:i];
[theDecryptions insertObject:theNewPlainText
atIndex:i];
}
return theDecryptions;
}
I'd really like to pass this NSArray into another method that attempts to spell check each individual string within the array, and builds a new array that puts the strings with the fewest typo'd words at lower indicies, so they're displayed first. I'd like to use the system's dictionary like a text field would, so I can match against words that have been trained into the phone by its user.
My current guess is to split a given string up into words, then spell check each with NSSpellChecker's -checkSpellingOfString:StartingAt: and using the number of correct words to sort the Array. Is there an existing library method or well-accepted pattern that would help return such a value for a given string?

Well, I found a solution that works using UIKit/UITextChecker. It correctly finds the user's most preferred language dictionary, but I'm not sure if it includes learned words in the actual rangeOfMisspelledWords... method. If it doesn't, calling [UITextChecker hasLearnedWord] on currentWord inside the bottom if statement should be enough to find user-taught words.
As noted in the comments, it may be prudent to call rangeOfMisspelledWords with each of the top few languages in [UITextChecker availableLanguages], to help multilingual users.
-(void) checkForDefinedWords {
NSArray* words = [message componentsSeparatedByString:#" "];
NSInteger wordsFound = 0;
UITextChecker* checker = [[UITextChecker alloc] init];
//get the first language in the checker's memory- this is the user's
//preferred language.
//TODO: May want to search with every language (or top few) in the array
NSString* preferredLang = [[UITextChecker availableLanguages] objectAtIndex:0];
//for each word in the array, determine whether it is a valid word
for(NSString* currentWord in words){
NSRange range;
range = [checker rangeOfMisspelledWordInString:currentWord
range:NSMakeRange(0, [currentWord length])
startingAt:0
wrap:NO
language:preferredLang];
//if it is valid (no errors found), increment wordsFound
if (range.location == NSNotFound) {
//NSLog(#"%# %#", #"Valid Word found:", currentWord);
wordsFound++;
}
else {
//NSLog(#"%# %#", #"Invalid Word found:", currentWord);
}
}
//After all "words" have been searched, save wordsFound to validWordCount
[self setValidWordCount:wordsFound];
[checker release];
}

Related

Printing the most frequent words in a file(string) Objective-C

New to objective-c, need help to solve this:
Write a function that takes two parameters:
1 a String representing a text document and
2 an integer providing the number of items to return. Implement the function such that it returns a list of Strings ordered by word frequency, the most frequently occurring word first. Use your best judgement to decide how words are separated. Your solution should run in O(n) time where n is the number of characters in the document. Implement this function as you would for a production/commercial system. You may use any standard data structures.
What I tried so far (work in progress): ` // Function work in progress
// -(NSString *) wordFrequency:(int)itemsToReturn inDocument:(NSString *)textDocument ;
// Get the desktop directory (where the text document is)
NSURL *desktopDirectory = [[NSFileManager defaultManager] URLForDirectory:NSDesktopDirectory inDomain:NSUserDomainMask appropriateForURL:nil create:NO error:nil];
// Create full path to the file
NSURL *fullPath = [desktopDirectory URLByAppendingPathComponent:#"document.txt"];
// Load the string
NSString *content = [NSString stringWithContentsOfURL:fullPath encoding:NSUTF8StringEncoding error:nil];
// Optional code for confirmation - Check that the file is here and print its content to the console
// NSLog(#" The string is:%#", content);
// Create an array with the words contain in the string
NSArray *myWords = [content componentsSeparatedByString:#" "];
// Optional code for confirmation - Print content of the array to the console
// NSLog(#"array: %#", myWords);
// Take an NSCountedSet of objects in an array and order those objects by their object count then returns a sorted array, sorted in descending order by the count of the objects.
NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:myWords];
NSMutableArray *dictArray = [NSMutableArray array];
[countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
[dictArray addObject:#{#"word": obj,
#"count": #([countedSet countForObject:obj])}];
}];
NSLog(#"Words sorted by count: %#", [dictArray sortedArrayUsingDescriptors:#[[NSSortDescriptor sortDescriptorWithKey:#"count" ascending:NO]]]);
}
return 0;
}
This is a classic job for map-reduce. I am very familiar with objective-c, but as far as I know - these concepts are very easily implemented in it.
1st map-reduce is counting the number of occurances.
This step is basically grouping elements according to the word, and then counting them.
map(text):
for each word in text:
emit(word,'1')
reduce(word,list<number>):
emit (word,sum(number))
An alternative for using map-reduce is to use iterative calculation and a hash-map which will be a histogram that counts number of occurances per word.
After you have a a list of numbers and occurances, all you got to do is actually get top k out of them. This is nicely explained in this thread: Store the largest 5000 numbers from a stream of numbers.
In here, the 'comparator' is #occurances of each word, as calculated in previous step.
The basic idea is to use a min-heap, and store k first elements in it.
Now, iterate the remaining of the elements, and if the new one is bigger than the top (minimal element in the heap), remove the top and replace it with the new element.
At the end, you have a heap containing k largest elements, and they are already in a heap - so they are already sorted (though in reversed order, but dealing with it is fairly easy).
Complexity is O(nlogK)
To achieve O(n + klogk) you may use selection algorithm instead of the min-heap solution to get top-k, and then sort the retrieved elements.

Call a method on every word in NSString

I would like to loop through an NSString and call a custom function on every word that has certain criterion (For example, "has 2 'L's"). I was wondering what the best way of approaching that was. Should I use Find/Replace patterns? Blocks?
-(NSString *)convert:(NSString *)wordToConvert{
/// This I have already written
Return finalWord;
}
-(NSString *) method:(NSString *) sentenceContainingWords{
// match every word that meets the criteria (for example the 2Ls) and replace it with what convert: does.
}
To enumerate the words in a string, you should use -[NSString enumerateSubstringsInRange:options:usingBlock:] with NSStringEnumerationByWords and NSStringEnumerationLocalized. All of the other methods listed use a means of identifying words which may not be locale-appropriate or correspond to the system definition. For example, two words separated by a comma but not whitespace (e.g. "foo,bar") would not be treated as separate words by any of the other answers, but they are in Cocoa text views.
[aString enumerateSubstringsInRange:NSMakeRange(0, [aString length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
if ([substring rangeOfString:#"ll" options:NSCaseInsensitiveSearch].location != NSNotFound)
/* do whatever */;
}];
As documented for -enumerateSubstringsInRange:options:usingBlock:, if you call it on a mutable string, you can safely mutate the string being enumerated within the enclosingRange. So, if you want to replace the matching words, you can with something like [aString replaceCharactersInRange:substringRange withString:replacementString].
The two ways I know of looping an array that will work for you are as follows:
NSArray *words = [sentence componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
for (NSString *word in words)
{
NSString *transformedWord = [obj method:word];
}
and
NSArray *words = [sentence componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
[words enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id word, NSUInteger idx, BOOL *stop){
NSString *transformedWord = [obj method:word];
}];
The other method, –makeObjectsPerformSelector:withObject:, won't work for you. It expects to be able to call [word method:obj] which is backwards from what you expect.
If you could write your criteria with regular expressions, then you could probably do a regular expression matching to fetch these words and then pass them to your convert: method.
You could also do a split of string into an array of words using componentsSeparatedByString: or componentsSeparatedByCharactersInSet:, then go over the words in the array and detect if they fit your criteria somehow. If they fit, then pass them to convert:.
Hope this helps.
As of iOS 12/macOS 10.14 the recommended way to do this is with the Natural Language framework.
For example:
import NaturalLanguage
let myString = "..."
let tokeniser = NLTokenizer(unit: .word)
tokeniser.string = myString
tokeniser.enumerateTokens(in: myString.startIndex..<myString.endIndex) { wordRange, attributes in
performActionOnWord(myString[wordRange])
return true // or return false to stop enumeration
}
Using NLTokenizer also has the benefit of allowing you to optionally specify the language of the string beforehand:
tokeniser.setLanguage(.hebrew)
I would recommend using a while loop to go through the string like this.
NSRange spaceRange = [sentenceContainingWords rangeOfString:#" "];
NSRange previousRange = (NSRange){0,0};
do {
NSString *wordString;
wordString = [sentenceContainingWord substringWithRange:(NSRange){previousRange.location+1,(spaceRange.location-1)-(previousRange.location+1)}];
//use the +1's to not include the spaces in the strings
[self convert:wordString];
previousRange = spaceRange;
spaceRange = [sentenceContainingWords rangeOfString:#" "];
} while(spaceRange.location != NSNotFound);
This code would probably need to be rewritten because its pretty rough, but you should get the idea.
Edit: Just saw Jacob Gorban's post, you should definitely do it like that.

capitalizedString doesn't capitalize correctly words starting with numbers?

I'm using the NSString method [myString capitalizedString], to capitalize all words of my string.
However capitalization doesn't work very well for words starting with numbers.
i.e. 2nd chance
becomes
2Nd Chance
Even if n is not the first letter of the word.
thanks
You have to roll your own solution to this problem. The Apple docs state that you may not get the specified behavior using that function for multi-word strings and for strings with special characters. Here's a pretty crude solution
NSString *text = #"2nd place is nothing";
// break the string into words by separating on spaces.
NSArray *words = [text componentsSeparatedByString:#" "];
// create a new array to hold the capitalized versions.
NSMutableArray *newWords = [[NSMutableArray alloc]init];
// we want to ignore words starting with numbers.
// This class helps us to determine if a string is a number.
NSNumberFormatter *num = [[NSNumberFormatter alloc]init];
for (NSString *item in words) {
NSString *word = item;
// if the first letter of the word is not a number (numberFromString returns nil)
if ([num numberFromString:[item substringWithRange:NSMakeRange(0, 1)]] == nil) {
word = [item capitalizedString]; // capitalize that word.
}
// if it is a number, don't change the word (this is implied).
[newWords addObject:word]; // add the word to the new list.
}
NSLog(#"%#", [newWords description]);
Unfortunately this seems to be the general behaviour of capitalizedString.
Perhaps a not so nice workaround / hack would be to replace each number with a string before the transformation, and then change it back afterwards.
So, "2nd chance" -> "xyznd chance" -> "Xyznd Chance" -> "2nd Chance"

Finding 2 Capitalized Words in a Row NSString

I'm writing a Mac app that goes through an NSString, and adds all its word to an NSArray (by separating them based on whitespace). Now, I've got the whole system down, but I'm still having one little problem: names (first + last), are added as two different words, and that's bothersome to me.
I thought of a couple solutions to fix this. My best idea was to, before actually adding the words to the array, join two words in a row that are capitalized. Then, through an if statement, determine if a word has two capitals in it, and then split the word and add it as one word. However, I can't find a way to find 2 words in a row with capitals.
Should I be using RegexKitLite (which I'm not familiar with), for example, to find two capitalized words in a row? I've seen this question: Regexp to pull capitalized words not at the beginning of sentence and two adjacent words
which seems somehow related, but due to my lack of understand of regular expressions, I don't really know if this is exactly what I need.
I've also seen this: Separating NSString into NSArray, but allowing quotes to group words
which is also similar, yet not exactly adapted to my needs.
So, to conclude, does anyone know how to either join capitalized words in an NSString, or even better, how to find two capitalized words in a row in an NSString ?
If you're targeting iOS 4.0 or greater OR OS 10.7 you can use NSRegularExpression
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"[A-Z]\\w*\\s[A-Z]\\w*"
options:nil
error:&error];
NSString *inputString = #"One two Three Four five six Seven Eight";
NSArray *stringsWithTwoCapitalizedWordsInARow = [regex
matchesInString:inputString
options:0
range:NSMakeRange(0, [string length])];
You'll get something like this
["Three Four", "Seven Eigth"]
You could just do a second pass on the resulting array after it has been loaded to append entries together that need to be joined.
Names are notoriously difficult to match with regular expressions alone, as it is not unheard of for names (first or last) to contain spaces themselves.
NSMutableArray* words = ...;
NSMutableArray* joinedWords = [NSMutableArray array];
for (int i=0; i < [words length]; i++)
{
NSString* currentLine = [words objectAtIndex:i];
bool capitalized = false;
bool capitalizedNext = false;
capitalized = isCap(currentLine); // Up to your discretion here
NSString* nextLine = nil;
// for the last entry
if (i+1 < [words length])
{
nextLine = [words objectAtIndex:i+1];
capitalizedNext = isCap(nextLine);
}
// Check if first letter is uppercase
if (capitalized == true && capitalizedNext == true)
{
[words replaceObjectAtIndex:i withObject:[NSString stringWithFormat:#"%# %#", currentLine, nextLine];
[words removeObjectAtIndex:i+1];
// Run test again on new version of the line
i--;
}
else
{
[joinedWords addObject:currentLine];
}
}
[A-Z][A-Za-z]* [A-Z][A-Za-z]*|[\S]*
http://rubular.com/r/DrOabOAfBr
I've written a regular expression for you. This regex will try to match a name first, then fall back to a word, so your job is as simple as feeding this into NSRegularExpression, and take all the matches as your words, or names joined.

Weird cocoa bug?

Hey folks, beneath is a piece of code i used for a school assignment.
Whenever I enter a word, with an O in it (which is a capital o), it fails!
Whenever there is one or more capital O's in this program, it returns false and logs : sentence not a palindrome.
A palindrome, for the people that dont know what a palindrome is, is a word that is the same read left from right, and backwards. (e.g. lol, kayak, reviver etc)
I found this bug when trying to check the 'oldest' palindrome ever found: SATOR AREPO TENET OPERA ROTAS.
When I change all the capital o's to lowercase o's, it works, and returns true.
Let me state clearly, with this piece of code ALL sentences/words with capital O's return false. A single capital o is enough to fail this program.
-(BOOL)testForPalindrome:(NSString *)s position:(NSInteger)pos {
NSString *string = s;
NSInteger position = pos;
NSInteger stringLength = [string length];
NSString *charOne = [string substringFromIndex:position];
charOne = [charOne substringToIndex:1];
NSString *charTwo = [string substringFromIndex:(stringLength - 1 - position)];
charTwo = [charTwo substringToIndex:1];
if(position > (stringLength / 2)) {
NSString *printableString = [NSString stringWithFormat:#"De following word or sentence is a palindrome: \n\n%#", string];
NSLog(#"%# is a palindrome.", string);
[textField setStringValue:printableString];
return YES;
}
if(charOne != charTwo) {
NSLog(#"%#, %#", charOne, charTwo);
NSLog(#"%i", position);
NSLog(#"%# is not a palindrome.", string);
return NO;
}
return [self testForPalindrome:string position:position+1];
}
So, is this some weird bug in Cocoa?
Or am I missing something?
B
This of course is not a bug in Cocoa, as you probably knew deep down inside.
Your compare method is causing this 'bug in Cocoa', you're comparing the addresses of charOne and charTwo. Instead you should compare the contents of the string with the isEqualToString message.
Use:
if(![charOne isEqualToString:charTwo]) {
Instead of:
if(charOne != charTwo) {
Edit: tested it in a test project and can confirm this is the problem.
Don't use charOne != charTwo
Instead use one of the NSString Compare Methods.
if ([charOne caseInsensitiveCompare:charTwo] != NSOrderedSame)
It may also have to do with localization (but I doubt it).