cocos2d frame rate lag on dictionary creation and search - objective-c

I am trying to create a create a simple iPhone game that would throughout the course of running be doing multiple checks to see if user input was a real word. I have a 1.7mb text file (is this a reasonable size?) with each word on its own line containing all of the words in the english language. This is the code that runs in the init method of the game scene. correctWords is an array that will contain all of the users verified word guesses. This code parses through the text file and puts all of the words into an array called currentDict:
correctWords = [[NSMutableArray alloc] init];
//set where to get the dictionary from
NSString *filePath = [[NSBundle mainBundle] pathForResource: [NSString stringWithFormat: #"dictionary"] ofType:#"txt"];
//pull the content from the file into memory
NSData* data = [NSData dataWithContentsOfFile:filePath];
//convert the bytes from the file into a string
NSString* string = [[[NSString alloc] initWithBytes:[data bytes]
length:[data length]
encoding:NSUTF8StringEncoding] autorelease];
//split the string around newline characters to create an array
NSString* delimiter = #"\n";
currentDict = [string componentsSeparatedByString:delimiter];
[currentDict retain];
and then to verify if the word the user inputs is in fact a word I have this check
if([currentDict containsObject: userInput]){
Whenever the game scene loads, there is a very noticeable delay (3-4 seconds) on the device itself, although there it happens almost instantly in the simulator, and then also I have animations running throughout most of the game, but whenever it tries to verify a word, there is a slight but noticeable lag in the animations. I am just wondering if there is a better way to get the dictionary loaded into memory, or if there is some kind of standard practice for verifying words. Also why would checking if it is a word cause a lag in the animation? I had assumed the animation was part of its own thread (and thus would theoretically not be affected)

I would recommend an alternative approach. I don't know how your game works, but it might make sense to give the player a limited set of possible word choices, for example something like Draw Something where there are only so many words you could type; then you would test against far fewer. Before the scene loads, you can have the set of possible words selected from your dictionary then provide letters or options (whatever your game is going) that only allows the user to come up with words that are in that set. Then you can test against a small set.
Another option is to repeat what I've said above frequently throughout your level, so the amount of available words are constantly changing, but load that set periodically when you're not in the middle of an animation or whatever. If there is a brief pause in the game play as the level gets harder, then load new words, or something similar.
That way the real-time game play is not affected by a large dictionary but you can still offer many options throughout the gameplay.

Nothing surprising that comparing thousands of string takes some time and causes lag in animation. You should something read about binary search, hashing, etc. Also loading entire file into NSString and then splitting it is very slow. Your code is just awful, sorry.

Related

Removing non-ascii characters from NSData?

First off, I'm not exactly sure what is happening or if I fully understand it enough to describe the issue so I'll try my best.
I'm encoding a NSData object that contains json and one of the objects contains a degree symbol. We believe this what is causing the issue and would like to remove it before encoding since the problem occurs during encoding.
I have plenty of options out there for removing certain characters from strings but none from doing it from the NSData object itself. Wondering if this is even possible or if its an issue with how I'm already encoding it.
This is how the NSData object is being encoded and turned back into a NSData object to serialize it to json. Right now I'm not trying to remove the degree symbol, using Latin 1 because another character I want to use but do not need it, this probably isn't the best way to do but it works for majority of other data objects that pass through it just not this one so this needs to change.
NSString* stringISOLatin1 = [NSString stringWithCString:data.bytes encoding:NSISOLatin1StringEncoding];
NSData* dataUTF8 = [stringISOLatin1 dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:NO];
The results are a little weird, most of the time it works fine, even including the degree symbol in the text when displayed on screen. Other times after encoding the string comes back messed up at the end which makes it unable to be serialized.
Any help would be appreciated even if it just leads to a better explanation of what is happen. Thanks
The problem is likely that you are using NSString:stringWithCString:encoding: to convert your data object. This function requires the data to be null terminated. NSData objects do not have to be NULL terminated because they have an explicit length. If the NULL character is missing it will continue to read whatever there happens to be after the string, giving you either garbage at the end or possibly crash because of memory violation.
Instead try using this:
NSString *stringISOLatin1 = [[NSString alloc] initWithData:data encoding:NSISOLatin1StringEncoding];

Suggestions to Process Complex + Large Data File

I have a very large and complex data file (.txt, see snippet below) of about 10MB and would like to know the best way to store it and access it later on.
My app currently uses core data for storage of other entities but I don't see how I can create an entity from this type of data file because of its complexity.
This file is divided as follows:
First line of each major section begins with an A| and means a new 'airway' to be defined. Then, is it's name, so in the example below we have the airway named V320 and another named V321. On the following lines, we have important data, the 'points'/waypoints which make up this airway. Each one has a name, and coordinates. So the first one here is PLN at 45.63N and -84.66W (coordinates). Then, from there the next one is LORIW at 45.35N and -84.92W, from LORIW we go to IROTO, and so on...
NOTE: There may be two, three, maybe even 4 airways with the same 'name' like V320 for example has 3...but each one is in it's own part of the map.
The other values there are irrelevant such as the numbers after the coordinate pair.
In essence, I need all this so that I can then draw lines on my map (GMSPolyLine using Google map SDK) which goes through all these points for each airway and then to create GMSMarkers(google version of MKAnnotation) for each waypoint which the user can tap.
I can handle the drawing of lines/markers on the map but the difficult part for me to visualize is the manipulation of this data and making it easier to access.
Let me know if you have any questions.
A|V320|20
S|PLN|045630647|-0084664108|LORIW|045352072|-0084924214|0|219|1998
S|LORIW|045352072|-0084924214|IROTO|045188989|-0085075111|219|219|1168
S|IROTO|045188989|-0085075111|ADENO|045030644|-0085220425|219|219|1132
S|ADENO|045030644|-0085220425|TIDDU|044877978|-0085359767|215|215|1090
S|TIDDU|044877978|-0085359767|SKIPR|044831714|-0085401772|215|215|330
.....
A|V321|29
S|PZD|031655206|-0084293100|KUTVE|031866950|-0084451303|0|329|1505
S|KUTVE|031866950|-0084451303|DUVAT|031948772|-0084512695|329|329|582
S|DUVAT|031948772|-0084512695|LUMPP|032041158|-0084582139|329|329|657
S|LUMPP|032041158|-0084582139|PREST|032176375|-0084684117|329|329|963
S|PREST|032176375|-0084684117|CSG|032615253|-0085017631|326|326|3129
S|CSG|032615253|-0085017631|JALVO|032722436|-0085064033|326|339|684
.....
Your data exhibits some regularity. If it is predictable and consistent, just write a parser that iterates through the file and creates appropriate Core Data entities.
For example, the fact that each new airway is separated by a newline can help you find those. Also, each final waypoint is repeated in the next line unless you are at the end of an airway record. I think you can do this in maybe 20-30 lines of code.
On your development machine (or even on an iPad or recent iPhone, for that matter), even creating a 10MB array in memory (to be parsed) should not be a constraint.
If the data is static, you can use the resulting sqlite database as a read-only persistent store that you can include in your app bundle.
As for the parser, it would be something like this:
NSString *file = [[NSString alloc] initWithContentsOfFile:fileURLString
encoding:NSUTF8StringEncoding error:nil];
NSArray *lines = [file componentsSeparatedByString:#"\n"];
for (NSString *line in lines) {
if (line.length < 1) { continue; }
NSArray *fields = [line componentsSeparatedByString:#"|"];
if ([fields.firstObject isEqualToString:#"A"]) {
// insert new airway object and populate with other fields
}
else if ([fields.firstObject isEqualToString:#"S"]) {
// insert new waypoint object (two for each first line)
// assign as relationship to the current airway
// and to another waypoint as necessary
}
}
[managedObjectContext save:nil];

What is the most efficient way to compare an NSString in this way

I have an app (Cocoa Touch, Web Browser), however I need to be able to compare an NSString with thousands of other strings. Here's the deal.
When a WebView loads, I get the URL. I need to compare this URL with literally thousands of results (27,847). Each of those numbers represents a line of text in a plain text file.
I would like to know the best way to go about getting the data from the text file, and comparing it with the NSString. I need to know if the URL that the WebView is loading contains any of these strings.
The app needs to be very fast, so I can't just parse through every line in the text file, turn it into an array, and then compare each and every result.
Please share your ideas. Thanks.
I think the cleanest solution is to:
Create a web service that can offload the work to a server and return a response. Since it sounds like you're building a web protection service, your database may grow to be quite substantial over time, and you can just scale your server up to increase its speed. Furthermore, you don't want to have to update your app every time the lookup data changes.
Other options are:
Use a local SQLite database. SQL databases should perform lookups relatively fast.
If you don't want to use any database, have you tried putting all the search strings into an NSDictionary or NSMutableDictionary object? This way, you would just check if the valueForKey: for the string you're searching for is nil.
Sample code for this:
NSDictionary *searchDictionary = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithBool:YES], #"google.com",
[NSNumber numberWithBool:YES], #"yahoo.com",
[NSNumber numberWithBool:YES], #"bing.com",
nil];
NSString *searchString = #"bing.com";
if ([searchDictionary valueForKey:searchString]) {
// search string found
} else {
// search string not found
}
Note: if you want the NSDictionary to perform case-insensitive comparisons, pre-load all values lowercase, and make the search string lowercase when using valueForKey:.
How much memory this could take is a whole other story, but I don't see how this comparison could be made much faster locally. I strongly recommend the remove web service approach, though.
Create a string from the file and enumerate through the lines.
NSString *stringToCheck;
NSData *bytesOfFile = [NSData dataWithContentsOfFile:#"/path/myfile.txt"];
NSString *fileString = [[NSString alloc] initWithData:bytesOfFile
encoding:NSUTF8Encoding];
__block BOOL foundMatch = NO;
[fileString enumerateLinesUsingBlock:^(NSString *line, BOOL *stop){
if([stringToCheck isEqualToString:line]){
*stop = YES;
foundMatch = YES;
}
}];
This is a job for regular expressions. Take all of the substrings you're looking for/filtering against, escape them appropriately (escaping characters such as [, ], |, and \, among others, with \), and join them with a |. The resulting string is your regular expression, which you apply to each URL.
You could loop through an entire array full of substrings, doing rangeOfString:options: with each one, but that's the slow way. A good regular expression implementation is built for this sort of thing, and I would hope that Apple's implementation is suitable.
That said, profile the hell out of it. I've seen some regex implementations choke on the | operator, so you'll want to make sure that Apple's is not one of them.
If you need to compare each string in your text file, you are going to have to compare it, no way around it.
What you can do however is do it on a background thread while showing some loading or something, and it won't feel as if the app got stuck.
I would suggest you try with NSDictionary first. You can load up all your URLs into this, and internally it will use some sort of hash table/map for very quick (O(1)) lookup.
You can then check the result of [dictionary objectForKey:userURL], and if it returns something then the URL matched one in the dictionary.
The only problem with this is that it requires an exact string match. If your dictionary contains http://server/foobar and the user enters http://server/FOOBAR (because it's a case-insensitive server), you are going to get a miss on your lookup. Similarly, adding ?foobar queries to the end of URLs will result in a miss. You could also add an explicit port with server:80, and with %XX character encoding you can create hundreds of variations of the same URL. You will have to account for this and canonicalize both the URLs in your dictionary, and the URL entered by the user prior to lookup.

Optimizing scanning large text and matching against list of words or phrases

I'm working on an app that takes an article (simple HTML page), and a list of vocabulary terms (each may be a word, a phrase, or even a sentence), and creates a link for each term it finds. The problem is that for larger texts with more terms it takes a long time. Currently we are dealing with this by initially displaying the unmarked text, processing the links in the background, and finally reloading the web view when processing finishes. Still, it can take a while and some of our users are not happy with it.
Right now the app uses a simple loop on the terms, doing a replacement in the HTML. Basically:
for (int i=0; i<terms.count; i++){
NSString *term = [terms objectAtIndex:i];
NSString *replaceString = [NSString stringWithFormat:#"<a href="myUrl:\\%d>%#</a>", i, term];
htmlString = [htmlString stringByReplacingOccurrencesOfString:term
withString:replaceString
options:NSCaseInsensitiveSearch
range:NSMakeRange(0, [htmlString length] )];
}
However, we are dealing with multiple languages, so there is not just one replacement per term, but twenty! That's because we have to deal with punctuation at the beginning (upside-down question marks in Spanish) and end of each term. We have to replace "term", "term.", and "term?" with an appropriate hyperlink.
Is there a more efficient method I could use to get this HTML marked up?
I need to keep the index of the original term so that it can be retrieved later when the user clicks the link.
You could process the text as follows:
Instead of looping over the vocabluary, split the text into words and look up each word in the vocabluary.
Create some index, hash table or dictionary to make the lookup efficient.
Don't use stringByReplacingOccurrencesOfString. Each time it's called it makes a copy of the whole text and won't release the memory until the autopool is drained. (Interestingly, you haven't run into memory problems yet.) Instead use a NSMutableString instance where you append each word (and the characters between them), either as it was in the original text or decorated as a link.
What you're doing right now is this:
for each vocabulary word 'term'
search the HTML text for instances of term
replace each instance of term with an appropriate hyperlink
If you have a large text, then each search takes that much longer. Further, every time you do a replacement, you have to create a new string containing a copy of the text to do the replacement on, since stringByReplacingOccurrencesOfString:withString:options:range: returns a new string rather than modifying the existing string. Multiply that by N replacements.
A better option would be to make a single pass through the string, searching for all terms at once, and building up the resulting output string in a mutable string to avoid a Shlemiel the Painter-like runtime.
For example, you could use regular expressions like so:
// Create a regular expression that is an alternation of all of the vocabulary
// words. You only need to create this once at startup.
NSMutableString *pattern = [[[NSMutableString alloc] init] autorelease];
[pattern appendString:#"\\b("];
BOOL isFirstTerm = YES;
for (NSString *term in vocabularyList)
{
if (!isFirstTerm)
{
[pattern appendString:#"|"];
isFirstTerm = NO;
}
[pattern appendString:term];
}
[pattern appendString:#")\\b"];
// Create regular expression object
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionCaseInsensitive error:&error];
// Replace vocabulary matches with a hyperlink
NSMutableString *htmlCopy = [[htmlString mutableCopy] autorelease];
[regex replaceMatchesInString:htmlCopy
options:0
range:NSMakeRange(0, [htmlString length])
withTemplate:#"\\1"];
// Now use htmlCopy
Since the string replace function your calling is Order N (it scans an replaces n words) and you're doing it for m vocabulary terms, you have an n^2 algorithm.
If you could do it in one pass, that would be optimal (order n - n words in html). The idea of presenting the un-replaced text first is still a good one unless it's unnoticeable even for large docs.
How about a hashset of vocabulary words, scan through the html word by (skipping html markup) and if the current scanned word is in the hash set, append that to the target buffer instead of the scanned word. That allows you to have 2 X the html content + 1 hash of vocabulary words in memory at most.
There are two approaches.
Hash Maps - if maximal length of you phrases is limited for example by two, you can iterate over all words and bigrams(2-words) and check them in HashMap - complexity is liniar, since Hash is constant time in ideal
Automaton theory
You can combine simple automatons which mach strings to single one and evaluation faster(i.e. dynamic programming). For example we have "John Smith"|"John Stuard" merge them and we get John S(mith|tuard) it is so called prefix optimisation(http://code.google.com/p/graph-expression/wiki/RegexpOptimization)
More advenced algorithm can be found here http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
I like this approach more becouse there are no limitation of phrase length and it allow to combine complex regexps.

Objective-C memory management problem

I'm getting an EXC_BAD_ACCESS error, and It's because of this part of code. Basically, I take an input and do some work on it. After multiple inputs, it throws the error. Am I doing something wrong with my memory here? I'd post the rest of the code, but it's rather long -- and I think this may be where my problem lies (It's where Xcode points me, at least).
-(IBAction) findShows: (id) clicked
{
char urlChars[1000];
[self getEventURL: urlChars];
NSString * theUrl = [[NSString alloc] initWithFormat:#"%s", urlChars];
NSData *data = [NSData dataWithContentsOfURL:[NSURL URLWithString:theUrl]];
int theLength = [data length];
NSString *content = [NSString stringWithUTF8String:[data bytes]];
char eventData[[data length]];
strcpy(eventData, [content UTF8String]);
[self parseEventData: eventData dataLength: theLength];
[whatIsShowing setStringValue:#"Showing events by this artist"];
}
When a crash occurs, there will be a backtrace.
Post it.
Either your program will break in the debugger, and the call stack will be in the debugger UI (or you can type 'bt
With that, the cause of the crash is often quite obvious. Without that, we are left to critique the code.
So, here goes....
char urlChars[1000];
[self getEventURL: urlChars];
This is, at best, a security hole and, at worst, the source of your crash. Any time you are going to copy bytes into a buffer, there should be some kind of way to (a) limit the # of bytes copied in (pass the length of the buffer) and (b) the # of bytes copied is returned (0 for failure or no bytes copied).
Given the above, what happens if there are 1042 bytes copied into urlChars by getEventURL:? boom
NSString * theUrl = [[NSString alloc] initWithFormat:#"%s", urlChars];
This is making some assumptions about urlChars that will lead to failure. First, it assumes that urlChars is of a proper %s compatible encoding. Secondly, it assumes that urlChars is NULL terminated (and didn't overflow the buffer).
Best to use one of the various NSString methods that create strings directly from the buffer of bytes using a particular encoding. More precise and more efficient.
NSData *data = [NSData dataWithContentsOfURL:[NSURL URLWithString:theUrl]];
I hope this isn't on the main thread... 'cause it'll block if it is and that'll make your app unresponsive on slow/flaky networks.
int theLength = [data length];
NSString *content = [NSString stringWithUTF8String:[data bytes]];
char eventData[[data length]];
strcpy(eventData, [content UTF8String]);
This is about the least efficient possible way of doing this. There is no need to create an NSString instance just to then turn it into a (char *). Just grab the bytes from the data directly.
Also -- are you sure that the data returned is NULL terminated? If not, that strcpy() is gonna blow right past the end of your eventData buffer, corrupting the stack.
[self parseEventData: eventData dataLength: theLength];
[whatIsShowing setStringValue:#"Showing events by this artist"];
What kind of data are you parsing that you really want to parse the raw bytes? In almost all cases, such data should be of some kind of structured type; XML or, even, HTML. If so, there is no need to drop down to parsing the raw bytes. (Not that raw data is unheard of -- just odd).
The bytes you get from [content UTF8String] could conceivably be different in number from the value of [data length]. Try using strncpy() instead and see if that still crashes. (It's also possible that getEventURL: sometimes fails to return a string in the format expected, but that's impossible to tell without the source to that method.)
Is it possible that the string contained in urlChars sometimes comes back non-NULL-terminated? You might want to try zeroing out the array, for example using bzero.
Additionally, there are a bunch of techniques for debugging EXC_BAD_ACCESS. Since you're doing a lot of pure C string manipulation, the usual method of turning on NSZombieEnabled may or may not help you (though I recommend turning it on regardless). Another technique you can try is recovering a previous stack frame using GDB. See my previous answer to a similar question if you're interested.
In my opinion the code is too complex. Do not resort to plain C arrays and strings unless you absolutely have to, they are harder to get right. (It’s no rocket science, but if you play with guns all the time, you will shoot yourself in the foot sooner or later.) Even if you insist on parsing plain C strings, isolate the code using the function interface:
// Callers have to mess with char*.
- (void) parseEventData: (char*) data {…}
// Callers can stay in the Objective-C land.
- (void) parseEventData: (NSString* or NSData*) data {
char *unwrappedData = …;
…
}
I’d certainly think twice before I used strcpy in my code. And I think you are leaking theUrl (although that should not cause EXC_BAD_ACCESS in this case). As for the bug itself, you might be hanging on parts of urlChars or eventData and when those stack-based variables disappear, you cause the segfault?