NSScanner - SLOW performance - (UITableView, NSXMLParser) - objective-c

I've had a problem that 's been bugging me for a few days now.
I'm parsing an RSS feed with NSXMLParser and feeding the results into a UITableView. Unfortunately, the feed returns some HTML which I parse out with the following method:
- (NSString *)flattenHTML:(NSString *)html {
NSScanner *theScanner;
NSString *text = nil;
theScanner = [NSScanner scannerWithString:html];
while ([theScanner isAtEnd] == NO) {
[theScanner scanUpToString:#"<" intoString:NULL] ;
[theScanner scanUpToString:#">" intoString:&text] ;
html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:#"%#>", text] withString:#""];
}
html = [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
return html;
}
I currently call this method during the NSXMLParser delegate method:
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName{
This works beautifully HOWEVER it takes almost a minute or more to parse and flatten the HTML into text and fill the cell. During that interminable minute my UITableView is entirely empty with just a lone spinner spinning. That's not good. This is last "bug" to squash before I ship this otherwise wonderfully working app.
It's works pretty quickly on the iOS simulator which isn't surprising.
Thanks in advance for any advice.

Your algorithm is not very good. For each tag you try to remove it, even if it is stripped already. Also each iteration of the loop causes a copy of the whole HTML string to be made, often without even stripping out anything. If you are not using ARC those copies also will persist until the current autorelease pool gets popped. You are not only wasting memory, you also do a lot of uneccessary work.
Testing your method (with the Cocoa wikipedia article) takes 3.5 seconds.
Here is an improved version of this code:
- (NSString *)flattenHTML:(NSString *)html {
NSScanner *theScanner = [NSScanner scannerWithString:html];
theScanner.charactersToBeSkipped = nil;
NSMutableString *result = [NSMutableString stringWithCapacity: [html length]];
while (![theScanner isAtEnd]) {
NSString *part = nil;
if ([theScanner scanUpToString:#"<" intoString: &part] && part) {
[result appendString: part];
}
[theScanner scanUpToString:#">" intoString:NULL];
[theScanner scanString: #">" intoString: NULL];
}
return [result stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
This will tell the scanner to get every character up to the first < and append them to the result string if there are any. Then it will skip up to the next > and then also skip the > to strip out the tag. This will get repeated until the end of the text. Every character is only touched once making this an O(n) algorithm.
This takes only 6.5 ms for the same data. That is about 530 times faster.
Btw, those measurements where made on a Mac. The exact values will of course be different on an iPhone.

I entered similar problem and I couldn't let it faster. Instead of this, I showed the progress bar to show how the parsing process done.
Below code is a part of that.
// at first, count the lines of XML file
NSError *error = nil;
NSString *xmlFileString = [NSString stringWithContentsOfURL:url
encoding:NSUTF8StringEncoding
error:&error];
_totalLines = [xmlFileString componentsSeparatedByString:#"\n"].count;
// do other things...
// delegate method when the parser find new section
- (void)parser:(NSXMLParser *)parser
didStartElement:(NSString *)elementName
namespaceURI:(NSString *)namespaceURI
qualifiedName:(NSString *)qName
attributes:(NSDictionary *)attributeDict
{
// do something ...
// back to main thread to change app appearance
NSOperationQueue *mainQueue = [NSOperationQueue mainQueue];
[mainQueue addOperationWithBlock:^{
// Here is important. Get the line number and update the progress bar.
_progressView.progress = (CGFloat)[parser lineNumber] / (CGFloat)_totalLines;
}];
}
I have sample project in GitHub. You can download and just run it. I wish my code may some help for you.
https://github.com/weed/p120727_XMLParseProgress

I'm not sure what exactly is the problem? is it that the flattenHTML method taking a lot of time to finished? or that it's blocking your app when it's running?
If the last one is your problem and assuming you are doing everything right in flattenHTML and that it really takes a lot of time to finish. The only thing you can do is make sure you are not blocking your main thread while doing this. You can use GCD or NSOperation to achieve this, there is nothing else you can do except letting the user know you are parsing the data now and let him decide if he wants to wait or cancel the operation and do something else.

Related

How to choose between two elements of the same name when parsing xml

I'm working on parsing xml in school, and I'm using Twitter's API to work with. I'm trying to grab the date the tweet was posted, but I'm running into an issue: there are two elements with the same name that hold different values inside of the xml. One is nested further in than the other, however, so logically I would want to check for the one that is nested. How would I go about doing that?
Everything is running perfectly right now. I just want to add more to my project.
Here's an example of the didEndElement method I'm calling.
-(void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI: (NSString *)namespaceURI qualifiedName:(NSString *)qName
{
//Creates an instance of the singleton to grab the array stored there
DataArraySingleton *singleton = [DataArraySingleton sharedArray];
//easy access to array
NSMutableArray *array = singleton.theArray;
//If the element ends with text
if ([elementName isEqualToString:#"text"])
{
//Sets the value/key pair for text
[tweets setValue:currentXMLValue forKey:elementName];
}
//Same as above
if ([elementName isEqualToString:#"screen_name"])
{
[tweets setValue:currentXMLValue forKey:elementName];
}
//Same as above
if ([elementName isEqualToString:#"profile_image_url"])
{
[tweets setValue:currentXMLValue forKey:elementName];
}
//If the element ends with status
if ([elementName isEqualToString:#"status"])
{
//Adds the objects collected above to the array
[array addObject:tweets];
//Resets the object TweetInformation to be called again above
tweets = nil;
}
//Resets xml value
currentXMLValue = nil;
}
Thanks guys
Add code to check for the parent tag of the nested date element that you want. When you encounter the start tag, set a flag in your code. Now when you encounter the date tag, check if the flag is set or not. Ignore the date tag if the flag isn't set. When you detect the end tag for the parent, reset the flag.

Parsing With NSXML Parser Cocoa

I am trying to parse the following information from xml in Cocoa.
<level>
<entity>
<name>red</name>
<id>0</id>
<body>false</body>
<x>0.0</x>
<y>0.0</y>
<rotation>0.0</rotation>
</entity>
Here is what i have so far from following the nsxml parser guide by apple.
NSString* currentElement;
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName{
currentElement = elementName;
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
if([currentElement isEqualToString:#"name"]){
NSLog(#"Name found: %#", string);
}
}
In my found character method it logs the statement but the only thing returning is "Name found: " and the rest is blank.
Am i doing this correctly to get the following tags in my xml? I would like to extract each tag.
for example:
NSString* name = THE_NAME;
int x = [THE_X_VALUE, intValue];
etc.
Could anyone help me out?
NSXMLParser is one of those classes that sounds good on paper, but drives you insane when you actually start to use it. If you really want to continue on this joyless path, implement parser:didStartElement: and then pray that parser:foundCharacters: delivers the goods.
In my experience, NSXMLParser is like a big toddler who smashes the puzzle and then hands you the pieces one by one. That seems like a great at first: "Oh look, I've got a piece! And another one! And one more!" but soon you find yourself bewildered: "Oh look, I've got 5.0! Now, was that the rotation or the y coordinates?" Who knows? Not NSXMLParser, that's for sure.
As for the last line of code: to extract the numeric value of a string, NSDecimalNumber is very handy.
if([currentElement isEqualToString:#"x"]){
NSDecimalNumber *dec = [NSDecimalNumber numberWithString:string];
CGFloat x = dec.floatValue;
}

How does NSXMLParser differentiate between different elements?

I just did a tutorial on NSXMLParser. What I am completely at a loss at is how NSXMLParser differentiates between different elements. To me it seems undefined.
This is my XML
<?xml version="1.0" encoding="UTF-8"?>
<Prices>
<Price id="1">
<name>Rosie O'Gradas</name>
<Beer>4.50</Beer>
<Cider>4.50</Cider>
<Guinness>4</Guinness>
</Price>
</Prices>
And this is my Parser
-(void) parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict {
if ([elementName isEqualToString:#"Prices"]) {
app.listArray = [[NSMutableArray alloc] init];
NSLog(#"The Prices Count");
}
else if ([elementName isEqualToString:#"Price"]) {
thelist = [[List alloc] init];
thelist.drinkID = [[attributeDict objectForKey:#"id"]integerValue];
}
}
-(void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
if (!currentElementValue) {
currentElementValue = [[NSMutableString alloc]initWithString:string];
} else {
[currentElementValue appendString:string];
}
}
-(void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
if ([elementName isEqualToString:#"Prices"]) {
return;
}
if ([elementName isEqualToString:#"Price"]) {
[app.listArray addObject:thelist];
thelist = nil;
} else {
[thelist setValue:currentElementValue forKey:elementName];
currentElementValue = nil;
}
}
I did notice that the names of the Properties in the data object were the same as in the parser. So I understood that at least.
What I am at a loss at is where it assigns these properties their value.
So at the beginning it initializes the data object with
thelist = [[List alloc] init];
(List is the Data object) But then it does the first thing that I don't understand
thelist.drinkID = [[attributeDict objectForKey:#"id"]integerValue];
Because it is in an if statement won't it get overwritten every time it finds an id attribute. Or is the 'theList' declaration creating multiple objects?
In the found characters I really have no idea what is going on. As much as I can tell foundCharaters string is every bit of text inside the elements. So current element value is really just a bundle of strings appended together (but I can't tell as for some reason I can't NSLOG it).
From there in the didEndElement section, I wonder if this is the correct interpretation of the code.
if ([elementName isEqualToString:#"Price"]) {
[app.listArray addObject:thelist];
thelist = nil;
}
I understand that every time that the parser hits the element Price that the app.list array object (declared in another class) has the object added to it 'thelist'.
But here is bit where my lack of understanding in the earlier method takes effect
else {
[thelist setValue:currentElementValue forKey:elementName];
currentElementValue = nil;
}
What are they doing here? From what I see the current element value is just a jumble of characters from the XML file. How is it organized? With the Element Name?
One more question (sorry for the length) why isn't the element name case sensitive, I was experimenting and I found it wasn't. Both languages are case sensitive.
If I interpret your question correctly, it is just about understanding the code which is working fine.
In your XML you have 4 child elements to Price with id=1: name, Beer, Cider and Guinness.
The foundCharacters method will find the characters inside these 4 xml tags, i.e. what is written between <name> and </name>, <Beer> and </Beer>, etc. In your case this is the string Rosie O'Gradas for name, then the string 4.50 for Beer etc.
When characters are found, the method first checks if a container string exists, if not it creates one as currentElementValue. If it does exist, it appends the found characters.
What happens next, logically? It will hit the didEndElement method, in the first case the tag </name>. In this case it will assign the collected text in currentElementValue to the key #"name" and put this key-value pair into the list. The list is of type List, which is defined somewhere else, but it seems to be essentially an NSDictionary.
Because currentElementValue has been stored successfully, it should be destroyed, so the check for its existence next time it hits foundCharacters will work.
Clear?

NSCharacterSet cuts the string

I am getting lastest tweet and show it in my app. I put it in a NSMutableString and initialize that string like below in my xmlparser.m file:
- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
currentNodeContent = (NSMutableString *) [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
I can get the tweet but somehow it cuts some of the tweets and shows some part of it. For example tweet is
Video games in the classroom? Social media & #technology can change education http://bit.ly/KfGViF #GOVERNING #edtech
but what it shows is:
#technology can change education http://bit.ly/KfGViF #GOVERNING #edtech
Why do you think it is? I tried to initialize currentNodeContent in other ways to but I could not solve the problem.
Do you have any idea why is this happening?
Event-driven (SAX) parsers are free to return only part of the text of a node in a callback. You might only be getting part of the tweet passed in. You should probably accumulate characters in a mutable string until you get a callback indicating the end of the element. See Listing 3 and the surrounding text in this guide.
You've got two problems here:
- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
currentNodeContent = (NSMutableString *) [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
Simply casting an NSString to an NSMutableString doesn't work. You have to make a mutable copy yourself or initialise a new NSMutableString using the contents of an NSString.
Furthermore, the text parser is only giving you the last part of the string because it may be interpreting the '&' simply as part of an entity reference, or it may be an entity reference itself.
What you probably want to do is:
Before you begin parsing, initialise currentNodeContent so that it is an empty NSMutableString:
currentNodeContent = [NSMutableString string];
As you are parsing, append the characters to the currentNodeContent:
[currentNodeContent appendString:[string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

Cut out a part of a long NSString

In my app I want to show a String that contains news. This string is being loaded just from a free Website, so the plain source code of the website does not contain only my string, its is more os less like this:
Stuff
More Stuff
More HTML Stuff
My String
More HTML Stuff
Final Stuff
And of course i want to cut off all the html stuff that i don't want in my NSString. Since i am going to change the String fron time to time the overall length of the Source code from the website changes. This means that substringFromIndex wont work. Is there any other way to Convert the complete source code to just the String that i need?
There are zillions of ways to manipulate text. I would start with regular expressions. If you give more details about the specifics of your problem, you can get more specific help.
Edit
Thanks for the link to the website. That gives me more to work with. If you will always know the id of the div whose contents you want, you can use NSXMLParser to extract the text of the div. This will set the text of an NSTextField to the contents of the div with id "I3_sys_txt". I did this on the Mac but I believe it will work on the iPhone as well.
-(IBAction)buttonPressed:(id)sender {
captureCharacters = NO;
NSURL *theURL = [NSURL URLWithString:#"http://maxnerios.yolasite.com/"];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:theURL];
[parser setDelegate:self];
[parser parse];
[parser release];
}
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict {
if ([elementName isEqual:#"div"] && [[attributeDict objectForKey:#"id"] isEqual:#"I3_sys_txt"]) {
captureCharacters = YES;
divCharacters = [[NSMutableString alloc] initWithCapacity:500];
}
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
if (captureCharacters) {
//from parser:foundCharacters: docs:
//The parser object may send the delegate several parser:foundCharacters: messages to report the characters of an element.
//Because string may be only part of the total character content for the current element, you should append it to the current
//accumulation of characters until the element changes.
[divCharacters appendString:string];
}
}
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
if (captureCharacters) {
captureCharacters = NO;
[textField setStringValue:divCharacters];
[divCharacters release];
}
}
Here is the NSRegularExpression class that you need to use: NSRegularExpression
Here is a 'beginning' tutorial on how to use the class: Tutorial
Here is a primer on what regular expressions are: Regex Primer
Here is an online regular expression tester: Tester
The tester may not work exactly as NSRegularExpression but it will help you understand regex definitions in general. Regular expressions are a key tool for software developers, a little daunting at first, but they can be used to great effect when searching or manipulating strings.
Although this looks like a lot of work - there is no 'quick answer' to what you are attempting. You say "is there any other way to Convert the complete source code to just the String I need?' - the answer is yes - regular expressions. But you need to define what 'just the String that I need' means, and regular expressions are one important way.