UITextChecker 25 Letter Words - objective-c

I believe this is an Apple bug, but wanted to run it by you all and see if anyone else had run into the same/similar issues.
Simply, Apple's UITextChecker finds all words 25 letters or more as valid, spelled correctly words. Go ahead and open up Notes on your iOS device (or TextEdit on OS X) and type in a random 24 letter word. Hit enter, underlined red, right? Now add one more letter to that line so it is a 25 letter word. Hit enter again, underline red, right ... nope!
I don't know if this is related, but I have a similar unanswered question out there (UITextChecker is what dictionary?) questioning what dictionary is used for UITextChecker. In /usr/share/dict/words the longest word is 24 letters. Seems rather coincidental that 25 letters would be the first length of word that is not in the dictionary and it is always accepted as a valid word. But I don't know if that word list is the dictionary for UITextChecker.
This is important to note for anyone that might be confirming the spelling of a given word for something like a game. You really don't want players to able to use a random 25 letters to spell a word and most likely score massive points.
Here's my code to check for valid words:
- (BOOL) isValidWord:(NSString*)word {
// word is all lowercase
UITextChecker *checker = [[UITextChecker alloc] init];
NSRange searchRange = NSMakeRange(0, [word length]);
NSRange misspelledRange = [checker rangeOfMisspelledWordInString:word range:searchRange startingAt:0 wrap:NO language:#"en" ];
[checker release];
BOOL validWord = (misspelledRange.location == NSNotFound);
BOOL passOneCharTest = ([word length] > 1 || [word isEqualToString:#"a"] || [word isEqualToString:#"i"]);
BOOL passLengthTest = ([word length] > 0 && [word length] < 25); // I don't know any words more than 24 letters long
return validWord && passOneCharTest && passLengthTest;
}
So my question to the community, is this a documented 'feature' that I just haven't been able to locate?

This is likely to be caused by the algorithm used for spell-checking itself although I admit it sounds like a bit of a hole.
Even spell-checkers that use a dictionary often tend to use an algorithm to get rid of false negatives. The classic is to ignore:
(a) single-character words followed by certain punctuation (like that (a) back there); and
(b) words consisting of all uppercase like NATO or CHOGM, assuming that they're quite valid acronyms.
If the algorithm for UITextChecker also considers 25+-letter words to be okay, that's just one of the things you need to watch out for.
It may well be related to the expected use case. It may be expected to be used as not so much for a perfect checker, but more as a best-guess solution.
If you really want a perfect filter, you're probably better off doing your own, using a copy of the dictionary from somewhere. That way, you can exclude things that aren't valid in your game (acronyms in Scrabble®, for example).
You can also ensure you're not subject to the vagaries of algorithms that assume longer words are valid as appears to be the case here. Instead you could just assume any word not in your dictionary is invalid (but, of course, give the user the chance to add it if your dictionary is wrong).
Other than that, and filing a query/bug with Apple, there's probably not much else you can do.

Related

Custom NSFormatter to accept only numbers

I have searched for an answer and tried a lot of user examples posted at SO, however they do not seem to answer my question.
In the UK the majority or area codes begin with zero, I have a single NSTextField and have created a customer NSNumberFormatter. I want my NSTextField to accept numbers beginning with zero, I dont want to use the NSNumberFormatter Padding option as the length of phone number may very but always start with zero.
- (BOOL)isPartialStringValid:(NSString*)partialString newEditingString:(NSString**)newString errorDescription:(NSString**)error {
if (partialString.length <= 0 || [partialString rangeOfCharacterFromSet:[[NSCharacterSet characterSetWithCharactersInString:#"0123456789."] invertedSet]].location != NSNotFound) {
NSLog(#"This is not a positive integer");
return NO;
}
return YES; } #end
The above example works and allows any number to be entered of any length but will always removes the leading zero upon moving focus away from the NSTextField.
Example numbers:
01202
01134
01103111345
How can I stop the leading zero being removed?
Thank you for reading.
Have you tried looking at a library to manage phone numbers such as
libPhoneNumber-iOS it's got most of this covered. Granted it maybe overkill if you're just looking at working with UK numbers.
I could copy the code here, but it probably more efficient to check out the framework. It's been around for years and years now so it well tested and I've used it many times when working for o2 in their apps. If you don't decide to use it, just look through the code, you'll find your answer there.
I see you're looking for mac you can do it but it's effort

method for Comparing two NSString for spelling mistakes iPhone programming

I am writing an iOS app for a game that is similar to Hangman, except that the player is required to guess the secret word one letter at a time, starting with the first letter. The secret word is displayed as asterisks (*) in a UITextField at the beginning of the game.
When the player guesses the first letter, the program should compare it against the secret word to see if the letter is correct. If the guess is correct, the app should replace the first asterisk with the correct letter. If the guess is incorrect, some other action will be taken. The player repeats this process one letter at a time until the secret word has been completely spelled out.
Here is the code I am currently using to check the guessed letter against the secret word, but it is not working properly.
-(void) checkGameLetter : (NSString *) letterToCheck{
bool match = NO;
NSRange gameLetterRange;
char charToCheck = [letterToCheck characterAtIndex:0];
for(int i = 0; i < self.correctWord.length; i++)
{
char tempString = [self.correctWord characterAtIndex:i];
if(charToCheck == tempString){
match = YES;
gameLetterRange = NSMakeRange(i, 1);//location, length
Screen.text =[Screen.text stringByReplacingCharactersInRange:gameLetterRange withString:letterToCheck];
}
}
The thing that's wrong with your code is that nothing in it says which letter of the correct word we are checking against.
For example, suppose the word is "zork" and the user guesses "r". You are trying to walk through "zork" looking to see if "r" matches any letter. But according to your spec, if this is a guess at the first letter, we should just be checking against the first letter ("z") and stop, since the "r" is wrong in that position.
So what you want to write is much simpler than the code you have. You don't want this:
-(void) checkGameLetter : (NSString *) letterToCheck{
You want this:
-(void) checkGameLetter:(NSString*)letterToCheck againstPosition:(NSInteger)position {
And there will be no loop: you will just look right at the letter in that position and see if they are the same.
Finally notice this important fact: == does not compare two strings. It asks whether they are the same object, which they manifestly are not. You want isEqualToString:.

how to insert extra glyphs?

I want to an UITextView to switch between two display modes.
In mode 1 it should show abbreviations and in the full word in mode 2. For example "Abbr." vs "abbreviation".
What would be the best way to do this? Keeping in mind that some words can have the same abbreviation and that the user is free to type either the full word or the abbreviation?
So far I tried to subclass NSLayoutManager.
Assuming I get an abbreviated string and I have to draw the full word, I would implement the following method:
-(void)setGlyphs:(const CGGlyph *)glyphs
properties:(const NSGlyphProperty *)props
characterIndexes:(const NSUInteger *)charIndexes
font:(UIFont *)aFont
forGlyphRange:(NSRange)glyphRange
{
NSUInteger length = glyphRange.length;
NSString *sourceString = #"a very long string as a source of characters for substitution"; //temp.
unichar *characters = malloc(sizeof(unichar) * length+4);
CGGlyph *subGlyphs = malloc(sizeof(CGGlyph) * length+4);
[sourceString getCharacters:characters
range:NSMakeRange(0, length+4)];
CTFontGetGlyphsForCharacters((__bridge CTFontRef)(aFont),
characters,
subGlyphs,
length+4);
[super setGlyphs:subGlyphs
properties:props
characterIndexes:charIndexes
font:aFont
forGlyphRange:NSMakeRange(glyphRange.location, length+4)];
}
However this method complains about invalid glyph indices "_NSGlyphTreeInsertGlyphs invalid char index" when I try to insert 4 additional glyphs.
You're barking way up the wrong tree; trying to subclass NSLayoutManager in this situation is overkill. Your problem is merely one of swapping text stretches (replace abbrev by original or original by abbrev), so just do that - in the text, the underlying NSMutableAttributedString being displayed.
You say in a comment "some words map to the same abbreviation". No problem. Assuming you know the original word (the problem would not be solvable if you did not), store that original word as part of the NSMutableAttributedString, i.e. as an attribute in the place where the word is. Thus, when you substitute the abbreviation, the attribute remains, and thus the original word is retained, ready for you when you need to switch it back.
For example, given this string: #"I love New York" You can hide the word "New York" as an attribute in the same stretch of text occupied by "New York":
[attributedString addAttribute:#"realword" value:#"New York" range:NSMakeRange(7,8)];
Now you can set that range's text to #"NY" but the attribute remains, and you can consult it when the time comes to switch the text back to the unabbreviated form.
(I have drawn out this answer at some length because many people are unaware that you are allowed to define your own arbitrary NSAttributedString attributes. It's an incredibly useful thing to do.)

numerical value of a unicode character in objective c

is it possible to get a numerical value from a unicode character in objective-c?
#"A" is 0041, #"➜" is 279C, #"Ω" is 03A9, #"झ" is 091D... ?
OK, so it’s perhaps worth pointing a few things out in a separate answer here. First, the term “character” is ambiguous, so we should choose a more appropriate term depending on what we mean. (See Characters and Grapheme Clusters in the Apple developer docs, as well as the Unicode website for more detail.)
If you are asking for the UTF-16 code unit, then you can use
unichar ch = [myString characterAtIndex:ndx];
Note that this is only equivalent to a Unicode code-point in the case where the code point is within the Basic Multilingual Plane (i.e. it is less than U+FFFF).
If you are asking for the Unicode code point, then you should be aware that UTF-16 supports characters outside of the BMP (i.e. U+10000 and above) using surrogate pairs. Thus there will be two UTF-16 code units for any code point above U+10000. To detect this case, you need to do something like
uint32_t codepoint = [myString characterAtIndex:ndx];
if ((codepoint & 0xfc00) == 0xd800) {
unichar ch2 = [myString characterAtIndex:ndx + 1];
codepoint = (((codepoint & 0x3ff) << 10) | (ch2 & 0x3ff)) + 0x10000;
}
Note that in production code, you should also test for and cope with the case where the surrogate pair has been truncated somehow.
Importantly, neither UTF-16 code units, nor Unicode code points necessarily correspond to anything that and end-user would regard as a “character” (the Unicode consortium generally refers to this as a grapheme cluster to distinguish it from other possible meanings of “character”). There are many examples, but the simplest to understand are probably the combining diacritical marks. For instance, the character ‘Ä’ can be represented as the Unicode code point U+00C4, or as a pair of code points, U+0041 U+0308.
Sometimes people (like #DietrichEpp in the comments on his answer) will claim that you can deal with this by converting to precomposed form before dealing with your string. This is something of a red herring, because precomposed form only deals with characters that have a precomposed equivalent in Unicode. e.g. it will not help with all combining marks; it will not help with Indic or Arabic scripts; it will not help with Hangul Jamos. There are many other cases as well.
If you are trying to manipulate grapheme clusters (things the user might think of as “characters”), you should probably make use of the NSString methods -rangeOfComposedCharacterSequencesForRange:, rangeOfComposedCharacterSequenceAtIndex: or the CFString function CFStringGetRangeOfComposedCharactersAtIndex. Obviously you cannot hold a grapheme cluster in an integer variable and it has no inherent numerical value; rather, it is represented by a string of code points, which are represented by a string of code units. For instance:
NSRange gcRange = [myString rangeOfComposedCharacterSequenceAtIndex:ndx];
NSString *graphemeCluster = [myString substringWithRange:gcRange];
Note that graphemeCluster may be arbitrarily long(!)
Even then, we have ignored the effects of matters such as Unicode’s support for bidirectional text. That is, the order of the code points represented by the code units in your NSString may in some cases be the reverse of what you might expect. The worse cases involve things like English text embedded in Arabic or Hebrew; this is supported by the Cocoa Text system, and so you really can end up with bidirectional strings in your code.
To summarise: generally speaking one should avoid examining NSString and CFString instances unichar by unichar. If at all possible, use an appropriate NSString method or CFString function instead. If you do find yourself examining the UTF-16 code units, please familiarise yourself with the Unicode standard first (I recommend “Unicode Demystified” if you can’t stomach reading through the Unicode book itself), so that you can avoid the major pitfalls.
Cocoa strings allow you to access the UTF-16 elements using -characterAtIndex:, so the following code will convert the string to a unicode code point:
unsigned strToChar(NSString *str)
{
unsigned c1, c2;
c1 = [str characterAtIndex:0];
if ((c1 & 0xfc00) == 0xd800) {
c2 = [str characterAtIndex:1];
return (((c1 & 0x3ff) << 10) | (c2 & 0x3ff)) + 0x10000;
} else {
return c1;
}
}
I am not aware of any convenience functions for this. You can use -characterAtIndex: by itself if you are okay with your code breaking horribly when someone uses characters outside the BMP; a number of applications on OS X break horribly in this way.
The following should render as a musical "G clef", U+1D11E, but if you copy and paste it into some text editors (TextMate), they'll let you do bizarre things like delete half of the character, at which point your text file is garbage.
𝄞

How to find out if there is an "." in an NSString?

Have got an
NSString *str = #"12345.6789"
and want to find out if there is that "." character inside of it. I'm afraid that there are ugly char-encoding issues when I would just try to match an #"." against this? How would you do it to make sure it always finds a match if there is one?
I just need to know that there is a dot in there. Everything else doesn't matter.
You can use rangeOfString: message to get the range where your "." is.
The prototype is:
- (NSRange)rangeOfString:(NSString *)aString
You can find more info about this message in: Mac Dev Center
There would be something like this:
NSRange range;
range = [yourstring rangeOfString:#"."];
NSLog(#"Position:%d", range.location);
If you need to, there is another message ( rangeOfString:options: ) where you can add some options like "Case sensitive" and so on.
If [str rangeOfString:#"."] returns anything else than {NSNotFound, 0}, the search string was found in the receiver. There are no encoding issues as NSString takes care of encoding. However, there might be issues if your str is user-provided and could contain a different decimal separator (e.g., a comma). But then, if str really comes from the user, many other things could go wrong with that comparison anyway.
To check . symbol, it will be useful.
if ([[str componentsSeparatedByString:#"."] count]>1) {
NSLog(#"dot is there");
}else{
NSLog(#"dot is not there");
}
If what you really want to do is determine whether the string represents a number with a fractional part, a better solution is to feed the string to a number formatter, then examine the number's doubleValue to see whether it has a fractional part.
For the latter step, one way would be to use the modf function, which returns both the fractional part (directly) and the integral part (by reference). If the fractional part is greater than zero (or some appropriately small fraction below which you're willing to tolerate), then the number has a fractional part.
The reason why this is better is because not everybody writes decimal fractions in the “12345.6789” format. Some countries use a comma instead, and I'm sure that's not the only variation. Let the number formatter handle such cases for you.
I wrote a little method to make things a little more natural if you use this sort of thing a whole bunch in your project:
+(BOOL)seeIfString:(NSString*)thisString ContainsThis:(NSString*)containsThis
{
NSRange textRange = [[thisString lowercaseString] rangeOfString:[containsThis lowercaseString]];
if(textRange.location != NSNotFound)
return YES;
return NO;
}
Enjoy!