Slow attribute enumeration inside NSAttributedString - objective-c

I'm building an app which relies on string attributes to display different sorts of data while editing a text. Certain attributes are then rendered differently, basically on every keypress, but only in the line range.
This operation is done using a simple enumeration block with a line object, which knows its own range inside the attributed string.
- (void)formatLine:(Line*)line {
// Line knows its range inside the text view / attributed string
[textView.textStorage enumerateAttributesInRange:line.textRange
options:0
usingBlock:^(NSDictionary<NSAttributedStringKey,id> * _Nonnull attrs,
NSRange range, BOOL * _Nonnull stop) {
// Set background etc.
}];
}
However, it turns out that enumerating an attributed range is very, very slow, especially if the NSMutableAttributedString is pretty long, even if the enumerated ranges themselves are usually only 100-300 characters in length. Setting the visual attributes (such as background/foreground color) don't do much difference, the bottleneck is enumeration itself – or maybe requesting a range for enumeration?
Is there a smarter way to retrieve ranges for attributes, or should I just scrap my current logic and start over?

Related

how to insert extra glyphs?

I want to an UITextView to switch between two display modes.
In mode 1 it should show abbreviations and in the full word in mode 2. For example "Abbr." vs "abbreviation".
What would be the best way to do this? Keeping in mind that some words can have the same abbreviation and that the user is free to type either the full word or the abbreviation?
So far I tried to subclass NSLayoutManager.
Assuming I get an abbreviated string and I have to draw the full word, I would implement the following method:
-(void)setGlyphs:(const CGGlyph *)glyphs
properties:(const NSGlyphProperty *)props
characterIndexes:(const NSUInteger *)charIndexes
font:(UIFont *)aFont
forGlyphRange:(NSRange)glyphRange
{
NSUInteger length = glyphRange.length;
NSString *sourceString = #"a very long string as a source of characters for substitution"; //temp.
unichar *characters = malloc(sizeof(unichar) * length+4);
CGGlyph *subGlyphs = malloc(sizeof(CGGlyph) * length+4);
[sourceString getCharacters:characters
range:NSMakeRange(0, length+4)];
CTFontGetGlyphsForCharacters((__bridge CTFontRef)(aFont),
characters,
subGlyphs,
length+4);
[super setGlyphs:subGlyphs
properties:props
characterIndexes:charIndexes
font:aFont
forGlyphRange:NSMakeRange(glyphRange.location, length+4)];
}
However this method complains about invalid glyph indices "_NSGlyphTreeInsertGlyphs invalid char index" when I try to insert 4 additional glyphs.
You're barking way up the wrong tree; trying to subclass NSLayoutManager in this situation is overkill. Your problem is merely one of swapping text stretches (replace abbrev by original or original by abbrev), so just do that - in the text, the underlying NSMutableAttributedString being displayed.
You say in a comment "some words map to the same abbreviation". No problem. Assuming you know the original word (the problem would not be solvable if you did not), store that original word as part of the NSMutableAttributedString, i.e. as an attribute in the place where the word is. Thus, when you substitute the abbreviation, the attribute remains, and thus the original word is retained, ready for you when you need to switch it back.
For example, given this string: #"I love New York" You can hide the word "New York" as an attribute in the same stretch of text occupied by "New York":
[attributedString addAttribute:#"realword" value:#"New York" range:NSMakeRange(7,8)];
Now you can set that range's text to #"NY" but the attribute remains, and you can consult it when the time comes to switch the text back to the unabbreviated form.
(I have drawn out this answer at some length because many people are unaware that you are allowed to define your own arbitrary NSAttributedString attributes. It's an incredibly useful thing to do.)

Building a CGPath around sentences in UITextView is incredibly slow at -positionFromPosition:

I'm doing some text analysis and have run into an annoying performance bump that I can't seem to find how to optimize. I start with the text from a UITextView and split the text into an array of sentences, splitting on characters in ".?!".
Then I loop over each sentence, splitting the sentence into an array of words, and pulling the first and last word from the sentence. With the NSRange of the sentence text in hand, I find the range of the first and last word in the UITextView's text.
The following part is where I get nailed with performance drains. This is how I find the bounding CGRect of the first and last word:
// the from range is increased each iteration
// so i'm not searching the entirety of the text each pass
NSRange range = [textView.text rangeOfString:firstWord options:kNilOptions range:fromRange];
UITextPosition *beginning = textView.beginningOfDocument;
UITextPosition *start = [textView positionFromPosition:beginning offset:range.location];
UITextPosition *end = [textView positionFromPosition:start offset:range.length];
UITextRange *textRange = [textView textRangeFromPosition:start toPosition:end];
firstRect = [textView firstRectForRange:textRange];
I perform this twice, once for the first word and once for the last word.
This works well on smaller text, but approaching 5+ paragraphs Instruments tells me that the UITextView -positionFromPosition: operation is eating up 492ms of clock time, locking up the UI and CPU at 100%.
The thing is I need the CGRect surrounding the first and last words so I can build a CGPath to highlight the sentence. The entire thing works and looks really great, but its the hang while the rects are found that is killing me. I'm fairly new to using UITextView's, so if there is something I can do, either optimizing my searches with ranges or somehow placing my operations on a background thread, I'd be much obliged.
You'll be better off using UITextView's attributedText property, which takes an NSAttributedString. With that, you can set the NSBackgroundColorAttributeName to a colour over a specific range.
Just note the attributed text methods only work in iOS 6+.

What is the most efficient way to compare an NSString in this way

I have an app (Cocoa Touch, Web Browser), however I need to be able to compare an NSString with thousands of other strings. Here's the deal.
When a WebView loads, I get the URL. I need to compare this URL with literally thousands of results (27,847). Each of those numbers represents a line of text in a plain text file.
I would like to know the best way to go about getting the data from the text file, and comparing it with the NSString. I need to know if the URL that the WebView is loading contains any of these strings.
The app needs to be very fast, so I can't just parse through every line in the text file, turn it into an array, and then compare each and every result.
Please share your ideas. Thanks.
I think the cleanest solution is to:
Create a web service that can offload the work to a server and return a response. Since it sounds like you're building a web protection service, your database may grow to be quite substantial over time, and you can just scale your server up to increase its speed. Furthermore, you don't want to have to update your app every time the lookup data changes.
Other options are:
Use a local SQLite database. SQL databases should perform lookups relatively fast.
If you don't want to use any database, have you tried putting all the search strings into an NSDictionary or NSMutableDictionary object? This way, you would just check if the valueForKey: for the string you're searching for is nil.
Sample code for this:
NSDictionary *searchDictionary = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithBool:YES], #"google.com",
[NSNumber numberWithBool:YES], #"yahoo.com",
[NSNumber numberWithBool:YES], #"bing.com",
nil];
NSString *searchString = #"bing.com";
if ([searchDictionary valueForKey:searchString]) {
// search string found
} else {
// search string not found
}
Note: if you want the NSDictionary to perform case-insensitive comparisons, pre-load all values lowercase, and make the search string lowercase when using valueForKey:.
How much memory this could take is a whole other story, but I don't see how this comparison could be made much faster locally. I strongly recommend the remove web service approach, though.
Create a string from the file and enumerate through the lines.
NSString *stringToCheck;
NSData *bytesOfFile = [NSData dataWithContentsOfFile:#"/path/myfile.txt"];
NSString *fileString = [[NSString alloc] initWithData:bytesOfFile
encoding:NSUTF8Encoding];
__block BOOL foundMatch = NO;
[fileString enumerateLinesUsingBlock:^(NSString *line, BOOL *stop){
if([stringToCheck isEqualToString:line]){
*stop = YES;
foundMatch = YES;
}
}];
This is a job for regular expressions. Take all of the substrings you're looking for/filtering against, escape them appropriately (escaping characters such as [, ], |, and \, among others, with \), and join them with a |. The resulting string is your regular expression, which you apply to each URL.
You could loop through an entire array full of substrings, doing rangeOfString:options: with each one, but that's the slow way. A good regular expression implementation is built for this sort of thing, and I would hope that Apple's implementation is suitable.
That said, profile the hell out of it. I've seen some regex implementations choke on the | operator, so you'll want to make sure that Apple's is not one of them.
If you need to compare each string in your text file, you are going to have to compare it, no way around it.
What you can do however is do it on a background thread while showing some loading or something, and it won't feel as if the app got stuck.
I would suggest you try with NSDictionary first. You can load up all your URLs into this, and internally it will use some sort of hash table/map for very quick (O(1)) lookup.
You can then check the result of [dictionary objectForKey:userURL], and if it returns something then the URL matched one in the dictionary.
The only problem with this is that it requires an exact string match. If your dictionary contains http://server/foobar and the user enters http://server/FOOBAR (because it's a case-insensitive server), you are going to get a miss on your lookup. Similarly, adding ?foobar queries to the end of URLs will result in a miss. You could also add an explicit port with server:80, and with %XX character encoding you can create hundreds of variations of the same URL. You will have to account for this and canonicalize both the URLs in your dictionary, and the URL entered by the user prior to lookup.

Optimizing scanning large text and matching against list of words or phrases

I'm working on an app that takes an article (simple HTML page), and a list of vocabulary terms (each may be a word, a phrase, or even a sentence), and creates a link for each term it finds. The problem is that for larger texts with more terms it takes a long time. Currently we are dealing with this by initially displaying the unmarked text, processing the links in the background, and finally reloading the web view when processing finishes. Still, it can take a while and some of our users are not happy with it.
Right now the app uses a simple loop on the terms, doing a replacement in the HTML. Basically:
for (int i=0; i<terms.count; i++){
NSString *term = [terms objectAtIndex:i];
NSString *replaceString = [NSString stringWithFormat:#"<a href="myUrl:\\%d>%#</a>", i, term];
htmlString = [htmlString stringByReplacingOccurrencesOfString:term
withString:replaceString
options:NSCaseInsensitiveSearch
range:NSMakeRange(0, [htmlString length] )];
}
However, we are dealing with multiple languages, so there is not just one replacement per term, but twenty! That's because we have to deal with punctuation at the beginning (upside-down question marks in Spanish) and end of each term. We have to replace "term", "term.", and "term?" with an appropriate hyperlink.
Is there a more efficient method I could use to get this HTML marked up?
I need to keep the index of the original term so that it can be retrieved later when the user clicks the link.
You could process the text as follows:
Instead of looping over the vocabluary, split the text into words and look up each word in the vocabluary.
Create some index, hash table or dictionary to make the lookup efficient.
Don't use stringByReplacingOccurrencesOfString. Each time it's called it makes a copy of the whole text and won't release the memory until the autopool is drained. (Interestingly, you haven't run into memory problems yet.) Instead use a NSMutableString instance where you append each word (and the characters between them), either as it was in the original text or decorated as a link.
What you're doing right now is this:
for each vocabulary word 'term'
search the HTML text for instances of term
replace each instance of term with an appropriate hyperlink
If you have a large text, then each search takes that much longer. Further, every time you do a replacement, you have to create a new string containing a copy of the text to do the replacement on, since stringByReplacingOccurrencesOfString:withString:options:range: returns a new string rather than modifying the existing string. Multiply that by N replacements.
A better option would be to make a single pass through the string, searching for all terms at once, and building up the resulting output string in a mutable string to avoid a Shlemiel the Painter-like runtime.
For example, you could use regular expressions like so:
// Create a regular expression that is an alternation of all of the vocabulary
// words. You only need to create this once at startup.
NSMutableString *pattern = [[[NSMutableString alloc] init] autorelease];
[pattern appendString:#"\\b("];
BOOL isFirstTerm = YES;
for (NSString *term in vocabularyList)
{
if (!isFirstTerm)
{
[pattern appendString:#"|"];
isFirstTerm = NO;
}
[pattern appendString:term];
}
[pattern appendString:#")\\b"];
// Create regular expression object
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionCaseInsensitive error:&error];
// Replace vocabulary matches with a hyperlink
NSMutableString *htmlCopy = [[htmlString mutableCopy] autorelease];
[regex replaceMatchesInString:htmlCopy
options:0
range:NSMakeRange(0, [htmlString length])
withTemplate:#"\\1"];
// Now use htmlCopy
Since the string replace function your calling is Order N (it scans an replaces n words) and you're doing it for m vocabulary terms, you have an n^2 algorithm.
If you could do it in one pass, that would be optimal (order n - n words in html). The idea of presenting the un-replaced text first is still a good one unless it's unnoticeable even for large docs.
How about a hashset of vocabulary words, scan through the html word by (skipping html markup) and if the current scanned word is in the hash set, append that to the target buffer instead of the scanned word. That allows you to have 2 X the html content + 1 hash of vocabulary words in memory at most.
There are two approaches.
Hash Maps - if maximal length of you phrases is limited for example by two, you can iterate over all words and bigrams(2-words) and check them in HashMap - complexity is liniar, since Hash is constant time in ideal
Automaton theory
You can combine simple automatons which mach strings to single one and evaluation faster(i.e. dynamic programming). For example we have "John Smith"|"John Stuard" merge them and we get John S(mith|tuard) it is so called prefix optimisation(http://code.google.com/p/graph-expression/wiki/RegexpOptimization)
More advenced algorithm can be found here http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
I like this approach more becouse there are no limitation of phrase length and it allow to combine complex regexps.

appendAttributedString: in NSMutableAttributedString

I have an MSMutableAttributedString displayContent.
The attributes of the content vary across the string
i.e. the colours and font sizes can vary by letter.
I want to add a new character to the end of the string and for it to pick up the attributes of the last character in displayContent. I cannot know what those attributes are in advance since they are under user control.
When I append the new character (tempAttr):
NSAttributedString * tempAttr = [[NSAttributedString alloc] initWithString:appendage];
[displayContent appendAttributedString:tempAttr];
it appears to reset the attributes of the whole string to the attributes of the new character (which I haven't set since I can't know what they need to be).
How do I get tempAttr to pick up the attributes of the last character in displayContent?
Thanks.
Update.
Made progress on this in a clumsy but functional way.
Copy the attributes dictionary from the last character in the display (displayContent) and then reapply those attributes to the new character being added:
NSMutableDictionary * lastCharAttrs = [NSMutableDictionary dictionaryWithCapacity:5];
[lastCharAttrs addEntriesFromDictionary: [displayContent attributesAtIndex:0
effectiveRange:NULL]]; // get style of last letter
NSMutableAttributedString * tempAttr = [[NSMutableAttributedString alloc] initWithString:newCharacter
attributes:lastCharAttrs];
[displayContent appendAttributedString:tempAttr]; // Append to content in the display field
I would have hoped there was a more elegant way to do this like setting a property of the NSTextField.
I think I discovered a solution to this by accident, then found this page while looking for the answer to the problem I created for myself (the opposite of your issue).
If you do the following:
[[displayContent mutableString] appendString:newCharacter];
You'll end up with newCharacter appended and the previous attributes "stretched" to cover it. I cannot find this behavior documented anywhere, however, so you might be weary of counting on it.