NSLinguisticTagger Language is "und"

NSLinguisticTagger Language is "und" - objective-c

I'm trying to figure out the language of a string. If I pass it on from a variable it recognises the language as "und" but if I pass it on as
[tagger setString:[NSString stringWithFormat:#"Example 2 Three people have attached a rope around your belly and pull it with the indicated forces. The sketch isn’t true to scale and shows the situation froth above. a) Detennine with a drawing, in what direction you are pulled (assuming that you don’t put up any resistance) Choose a scale of l00N 2 Ion . b) What is the inﬂuence of the lengths of the ropes?"]];
it recognises the language correctly. The text is the same in both cases
//Recognize Language for Output
{
NSArray *tagschemes = [NSArray arrayWithObjects:NSLinguisticTagSchemeLanguage, nil];
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options:0];
[tagger setString:[NSString stringWithFormat:#"%#", text]];
NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
output = [NSString stringWithFormat:#"The language is %#\rand the following words were found:\r\r%#", language, output];
}
What kind of language is "und" and why doesn't it recoginze the language correctly as en?

From the docs for NSOrthography:
the tag und is used if a specific language cannot be determined.

The problem was that the text wasn't formatted properly. The original text in the "text" string looked like this:
2014-01-28 18:13:51.412 Tesseract[35357:70b] Old text: Example 2
Three people have attached a rope around your belly
and pull it with the indicated forces. The sketch isn’t
true to scale and shows the situation from above.
a) Detennine with a drawing, in what direction you are
pulled (assuming that you don’t put up any resistance)
Choose a scale of l00N 2 tom .
b) What is the influence of the lengths of the ropes?
If the text is reformatted using
text = [text stringByReplacingOccurrencesOfString:#"\n" withString:#" "];
it looks like this and NSLinguisticTagger will recognise the language correctly.
2014-01-28 18:13:51.412 Tesseract[35357:70b] Reformatted text: Example 2 Three people have attached a rope around your belly and pull it with the indicated forces. The sketch isn’t true to scale and shows the situation from above. a) Detennine with a drawing, in what direction you are pulled (assuming that you don’t put up any resistance) Choose a scale of l00N 2 tom . b) What is the influence of the lengths of the ropes?

Related

how to insert extra glyphs?

I want to an UITextView to switch between two display modes.
In mode 1 it should show abbreviations and in the full word in mode 2. For example "Abbr." vs "abbreviation".
What would be the best way to do this? Keeping in mind that some words can have the same abbreviation and that the user is free to type either the full word or the abbreviation?
So far I tried to subclass NSLayoutManager.
Assuming I get an abbreviated string and I have to draw the full word, I would implement the following method:
-(void)setGlyphs:(const CGGlyph *)glyphs
properties:(const NSGlyphProperty *)props
characterIndexes:(const NSUInteger *)charIndexes
font:(UIFont *)aFont
forGlyphRange:(NSRange)glyphRange
{
NSUInteger length = glyphRange.length;
NSString *sourceString = #"a very long string as a source of characters for substitution"; //temp.
unichar *characters = malloc(sizeof(unichar) * length+4);
CGGlyph *subGlyphs = malloc(sizeof(CGGlyph) * length+4);
[sourceString getCharacters:characters
range:NSMakeRange(0, length+4)];
CTFontGetGlyphsForCharacters((__bridge CTFontRef)(aFont),
characters,
subGlyphs,
length+4);
[super setGlyphs:subGlyphs
properties:props
characterIndexes:charIndexes
font:aFont
forGlyphRange:NSMakeRange(glyphRange.location, length+4)];
}
However this method complains about invalid glyph indices "_NSGlyphTreeInsertGlyphs invalid char index" when I try to insert 4 additional glyphs.

You're barking way up the wrong tree; trying to subclass NSLayoutManager in this situation is overkill. Your problem is merely one of swapping text stretches (replace abbrev by original or original by abbrev), so just do that - in the text, the underlying NSMutableAttributedString being displayed.
You say in a comment "some words map to the same abbreviation". No problem. Assuming you know the original word (the problem would not be solvable if you did not), store that original word as part of the NSMutableAttributedString, i.e. as an attribute in the place where the word is. Thus, when you substitute the abbreviation, the attribute remains, and thus the original word is retained, ready for you when you need to switch it back.
For example, given this string: #"I love New York" You can hide the word "New York" as an attribute in the same stretch of text occupied by "New York":
[attributedString addAttribute:#"realword" value:#"New York" range:NSMakeRange(7,8)];
Now you can set that range's text to #"NY" but the attribute remains, and you can consult it when the time comes to switch the text back to the unabbreviated form.
(I have drawn out this answer at some length because many people are unaware that you are allowed to define your own arbitrary NSAttributedString attributes. It's an incredibly useful thing to do.)

cocos2d frame rate lag on dictionary creation and search

I am trying to create a create a simple iPhone game that would throughout the course of running be doing multiple checks to see if user input was a real word. I have a 1.7mb text file (is this a reasonable size?) with each word on its own line containing all of the words in the english language. This is the code that runs in the init method of the game scene. correctWords is an array that will contain all of the users verified word guesses. This code parses through the text file and puts all of the words into an array called currentDict:
correctWords = [[NSMutableArray alloc] init];
//set where to get the dictionary from
NSString *filePath = [[NSBundle mainBundle] pathForResource: [NSString stringWithFormat: #"dictionary"] ofType:#"txt"];
//pull the content from the file into memory
NSData* data = [NSData dataWithContentsOfFile:filePath];
//convert the bytes from the file into a string
NSString* string = [[[NSString alloc] initWithBytes:[data bytes]
length:[data length]
encoding:NSUTF8StringEncoding] autorelease];
//split the string around newline characters to create an array
NSString* delimiter = #"\n";
currentDict = [string componentsSeparatedByString:delimiter];
[currentDict retain];
and then to verify if the word the user inputs is in fact a word I have this check
if([currentDict containsObject: userInput]){
Whenever the game scene loads, there is a very noticeable delay (3-4 seconds) on the device itself, although there it happens almost instantly in the simulator, and then also I have animations running throughout most of the game, but whenever it tries to verify a word, there is a slight but noticeable lag in the animations. I am just wondering if there is a better way to get the dictionary loaded into memory, or if there is some kind of standard practice for verifying words. Also why would checking if it is a word cause a lag in the animation? I had assumed the animation was part of its own thread (and thus would theoretically not be affected)

I would recommend an alternative approach. I don't know how your game works, but it might make sense to give the player a limited set of possible word choices, for example something like Draw Something where there are only so many words you could type; then you would test against far fewer. Before the scene loads, you can have the set of possible words selected from your dictionary then provide letters or options (whatever your game is going) that only allows the user to come up with words that are in that set. Then you can test against a small set.
Another option is to repeat what I've said above frequently throughout your level, so the amount of available words are constantly changing, but load that set periodically when you're not in the middle of an animation or whatever. If there is a brief pause in the game play as the level gets harder, then load new words, or something similar.
That way the real-time game play is not affected by a large dictionary but you can still offer many options throughout the gameplay.

Nothing surprising that comparing thousands of string takes some time and causes lag in animation. You should something read about binary search, hashing, etc. Also loading entire file into NSString and then splitting it is very slow. Your code is just awful, sorry.

Objective-C: format numbers to ordinals: 1, 2, 3, .. to 1st, 2nd, 3rd

In Objective C, is there any way to format an integer to ordinals
1 => "1st", 2 => "2nd" etc... that works for any language?
So if the user is French he will see "1er", "2ieme" etc..
Thanks a lot!
Edit:
This is for an iOs app

Have you taken a look at TTTOrdinalNumberFormatter which is in FormatterKit? It works great, and I'm pretty sure it's exactly what you're looking for.
Here's an example taken from the kit:
TTTOrdinalNumberFormatter *ordinalNumberFormatter = [[TTTOrdinalNumberFormatter alloc] init];
[ordinalNumberFormatter setLocale:[NSLocale currentLocale]];
[ordinalNumberFormatter setGrammaticalGender:TTTOrdinalNumberFormatterMaleGender];
NSNumber *number = [NSNumber numberWithInteger:2];
NSLog(#"%#", [NSString stringWithFormat:NSLocalizedString(#"You came in %# place!", nil), [ordinalNumberFormatter stringFromNumber:number]]);
Assuming you've provided localized strings for "You came in %# place!", the output would be:
* English: "You came in 2nd place!"
* French: "Vous êtes venu à la 2eme place!"
* Spanish: "Usted llegó en 2.o lugar!"

The solution is immediately available from NSNumberFormatter:
- (NSString *)getOrdinalStringFromInteger:(NSInteger)integer
{
NSNumberFormatter *formatter = [[NSNumberFormatter alloc] init];
[formatter setLocale:[NSLocale currentLocale]];
[formatter setNumberStyle:NSNumberFormatterOrdinalStyle];
return [formatter stringFromNumber:[NSNumber numberWithInteger:integer]];
}

You could use ICU, which includes a way of doing what you describe:
http://icu-project.org/apiref/icu4c/classRuleBasedNumberFormat.html
You don't say what context you're using Objective-C in, but if you're writing for Cocoa, ICU is actually present. However, reaching down to talk to it directly can be a bit tricky.
[edited to link to someone who actually seems to have figured out how to build ICU and link it]
How to build ICU so I can use it in an iPhone app?

You need a rule set for each language you want to support. Any language is asking too much: they are all wildly different. First, create a rule set class which holds the regular and the exception cases for a given language. That class needs a single method that takes a number and returns a string suffix (or the number plus the suffix.) Create rule set instances (statically) for each language you care about.
Then create a category on NSNumber that returns a suffix pulled from the appropriate rule set for whatever language the user needs (system locale, or some choice they make, or case by case.)
Each language has different rules, of course. For example, English is relatively complicated:
1st,
2nd,
3rd,
4th,
5th,
... 20th
and then it starts again at st, nd, rd, th... Unit 1s, 2s, 3s and 4s are always special cases. Zero is 'th' (zeroth, hundredth, millionth etc.)
French is different. 1er, then it's x ième all the way up. (These are usually abbreviated to just 're' and 'e', making French quite easy.)
Japanese gets very odd. Cardinal 1, 2, 3, 4: (ichi, ni, san, yon) becomes tsuichi, futsuka, mikka and yokka. Those aren't suffixes though: the numbers are named differently when they're used as ordinals. Luckily, because that's incredibly confusing, you can just stick a kanji 'kai' character (which looks like a box in box) after the number and everyone knows what you mean.

Swift:
func getOrdinalDegreeValue() -> String? {
let formatter = NumberFormatter()
formatter.locale = Locale.current
formatter.numberStyle = .ordinal
return formatter.string(from: NSNumber(value: 1)) // Number
}
1st

UITextChecker 25 Letter Words

I believe this is an Apple bug, but wanted to run it by you all and see if anyone else had run into the same/similar issues.
Simply, Apple's UITextChecker finds all words 25 letters or more as valid, spelled correctly words. Go ahead and open up Notes on your iOS device (or TextEdit on OS X) and type in a random 24 letter word. Hit enter, underlined red, right? Now add one more letter to that line so it is a 25 letter word. Hit enter again, underline red, right ... nope!
I don't know if this is related, but I have a similar unanswered question out there (UITextChecker is what dictionary?) questioning what dictionary is used for UITextChecker. In /usr/share/dict/words the longest word is 24 letters. Seems rather coincidental that 25 letters would be the first length of word that is not in the dictionary and it is always accepted as a valid word. But I don't know if that word list is the dictionary for UITextChecker.
This is important to note for anyone that might be confirming the spelling of a given word for something like a game. You really don't want players to able to use a random 25 letters to spell a word and most likely score massive points.
Here's my code to check for valid words:
- (BOOL) isValidWord:(NSString*)word {
// word is all lowercase
UITextChecker *checker = [[UITextChecker alloc] init];
NSRange searchRange = NSMakeRange(0, [word length]);
NSRange misspelledRange = [checker rangeOfMisspelledWordInString:word range:searchRange startingAt:0 wrap:NO language:#"en" ];
[checker release];
BOOL validWord = (misspelledRange.location == NSNotFound);
BOOL passOneCharTest = ([word length] > 1 || [word isEqualToString:#"a"] || [word isEqualToString:#"i"]);
BOOL passLengthTest = ([word length] > 0 && [word length] < 25); // I don't know any words more than 24 letters long
return validWord && passOneCharTest && passLengthTest;
}
So my question to the community, is this a documented 'feature' that I just haven't been able to locate?

This is likely to be caused by the algorithm used for spell-checking itself although I admit it sounds like a bit of a hole.
Even spell-checkers that use a dictionary often tend to use an algorithm to get rid of false negatives. The classic is to ignore:
(a) single-character words followed by certain punctuation (like that (a) back there); and
(b) words consisting of all uppercase like NATO or CHOGM, assuming that they're quite valid acronyms.
If the algorithm for UITextChecker also considers 25+-letter words to be okay, that's just one of the things you need to watch out for.
It may well be related to the expected use case. It may be expected to be used as not so much for a perfect checker, but more as a best-guess solution.
If you really want a perfect filter, you're probably better off doing your own, using a copy of the dictionary from somewhere. That way, you can exclude things that aren't valid in your game (acronyms in Scrabble®, for example).
You can also ensure you're not subject to the vagaries of algorithms that assume longer words are valid as appears to be the case here. Instead you could just assume any word not in your dictionary is invalid (but, of course, give the user the chance to add it if your dictionary is wrong).
Other than that, and filing a query/bug with Apple, there's probably not much else you can do.

appendAttributedString: in NSMutableAttributedString

I have an MSMutableAttributedString displayContent.
The attributes of the content vary across the string
i.e. the colours and font sizes can vary by letter.
I want to add a new character to the end of the string and for it to pick up the attributes of the last character in displayContent. I cannot know what those attributes are in advance since they are under user control.
When I append the new character (tempAttr):
NSAttributedString * tempAttr = [[NSAttributedString alloc] initWithString:appendage];
[displayContent appendAttributedString:tempAttr];
it appears to reset the attributes of the whole string to the attributes of the new character (which I haven't set since I can't know what they need to be).
How do I get tempAttr to pick up the attributes of the last character in displayContent?
Thanks.
Update.
Made progress on this in a clumsy but functional way.
Copy the attributes dictionary from the last character in the display (displayContent) and then reapply those attributes to the new character being added:
NSMutableDictionary * lastCharAttrs = [NSMutableDictionary dictionaryWithCapacity:5];
[lastCharAttrs addEntriesFromDictionary: [displayContent attributesAtIndex:0
effectiveRange:NULL]]; // get style of last letter
NSMutableAttributedString * tempAttr = [[NSMutableAttributedString alloc] initWithString:newCharacter
attributes:lastCharAttrs];
[displayContent appendAttributedString:tempAttr]; // Append to content in the display field
I would have hoped there was a more elegant way to do this like setting a property of the NSTextField.

I think I discovered a solution to this by accident, then found this page while looking for the answer to the problem I created for myself (the opposite of your issue).
If you do the following:
[[displayContent mutableString] appendString:newCharacter];
You'll end up with newCharacter appended and the previous attributes "stretched" to cover it. I cannot find this behavior documented anywhere, however, so you might be weary of counting on it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

NSLinguisticTagger Language is "und" - objective-c

From the docs for NSOrthography: the tag und is used if a specific language cannot be determined.

Related

how to insert extra glyphs?

cocos2d frame rate lag on dictionary creation and search

Objective-C: format numbers to ordinals: 1, 2, 3, .. to 1st, 2nd, 3rd

UITextChecker 25 Letter Words

appendAttributedString: in NSMutableAttributedString

Categories

Resources