NSCharacterSet cuts the string - objective-c

I am getting lastest tweet and show it in my app. I put it in a NSMutableString and initialize that string like below in my xmlparser.m file:
- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
currentNodeContent = (NSMutableString *) [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
I can get the tweet but somehow it cuts some of the tweets and shows some part of it. For example tweet is
Video games in the classroom? Social media & #technology can change education http://bit.ly/KfGViF #GOVERNING #edtech
but what it shows is:
#technology can change education http://bit.ly/KfGViF #GOVERNING #edtech
Why do you think it is? I tried to initialize currentNodeContent in other ways to but I could not solve the problem.
Do you have any idea why is this happening?

Event-driven (SAX) parsers are free to return only part of the text of a node in a callback. You might only be getting part of the tweet passed in. You should probably accumulate characters in a mutable string until you get a callback indicating the end of the element. See Listing 3 and the surrounding text in this guide.

You've got two problems here:
- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
currentNodeContent = (NSMutableString *) [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
Simply casting an NSString to an NSMutableString doesn't work. You have to make a mutable copy yourself or initialise a new NSMutableString using the contents of an NSString.
Furthermore, the text parser is only giving you the last part of the string because it may be interpreting the '&' simply as part of an entity reference, or it may be an entity reference itself.
What you probably want to do is:
Before you begin parsing, initialise currentNodeContent so that it is an empty NSMutableString:
currentNodeContent = [NSMutableString string];
As you are parsing, append the characters to the currentNodeContent:
[currentNodeContent appendString:[string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

Related

How to convert &#8211,&#8222 etc in Objective-C

I made server side by Python and which return some scraped html string to client side which is made by Objective-C.
But When I try to show from client side which retuned string from server , it contains &#8211,&#8222,etc.But I don't know why it contains above characters.
Do you have any idea? And I want to convert them correctly with Objective-C. Do you have any idea? Thanks in advance.
If you want to stick with Cocoa you could also try to use NSAttributedString and initWithHTML:documentAttributes:, you will lose the markup than, though:
NSData *data = [#"<html><p>&#8211 Test</p></html>" dataUsingEncoding:NSUTF8StringEncoding];
NSAttributedString *string = [[NSAttributedString alloc] initWithHTML:data documentAttributes:nil];
NSString *result = [string string];
These are HTML Entities
Here is NSString category for HTML and here are the methods available:
- (NSString *)stringByConvertingHTMLToPlainText;
- (NSString *)stringByDecodingHTMLEntities;
- (NSString *)stringByEncodingHTMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;

NSScanner - SLOW performance - (UITableView, NSXMLParser)

I've had a problem that 's been bugging me for a few days now.
I'm parsing an RSS feed with NSXMLParser and feeding the results into a UITableView. Unfortunately, the feed returns some HTML which I parse out with the following method:
- (NSString *)flattenHTML:(NSString *)html {
NSScanner *theScanner;
NSString *text = nil;
theScanner = [NSScanner scannerWithString:html];
while ([theScanner isAtEnd] == NO) {
[theScanner scanUpToString:#"<" intoString:NULL] ;
[theScanner scanUpToString:#">" intoString:&text] ;
html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:#"%#>", text] withString:#""];
}
html = [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
return html;
}
I currently call this method during the NSXMLParser delegate method:
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName{
This works beautifully HOWEVER it takes almost a minute or more to parse and flatten the HTML into text and fill the cell. During that interminable minute my UITableView is entirely empty with just a lone spinner spinning. That's not good. This is last "bug" to squash before I ship this otherwise wonderfully working app.
It's works pretty quickly on the iOS simulator which isn't surprising.
Thanks in advance for any advice.
Your algorithm is not very good. For each tag you try to remove it, even if it is stripped already. Also each iteration of the loop causes a copy of the whole HTML string to be made, often without even stripping out anything. If you are not using ARC those copies also will persist until the current autorelease pool gets popped. You are not only wasting memory, you also do a lot of uneccessary work.
Testing your method (with the Cocoa wikipedia article) takes 3.5 seconds.
Here is an improved version of this code:
- (NSString *)flattenHTML:(NSString *)html {
NSScanner *theScanner = [NSScanner scannerWithString:html];
theScanner.charactersToBeSkipped = nil;
NSMutableString *result = [NSMutableString stringWithCapacity: [html length]];
while (![theScanner isAtEnd]) {
NSString *part = nil;
if ([theScanner scanUpToString:#"<" intoString: &part] && part) {
[result appendString: part];
}
[theScanner scanUpToString:#">" intoString:NULL];
[theScanner scanString: #">" intoString: NULL];
}
return [result stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
This will tell the scanner to get every character up to the first < and append them to the result string if there are any. Then it will skip up to the next > and then also skip the > to strip out the tag. This will get repeated until the end of the text. Every character is only touched once making this an O(n) algorithm.
This takes only 6.5 ms for the same data. That is about 530 times faster.
Btw, those measurements where made on a Mac. The exact values will of course be different on an iPhone.
I entered similar problem and I couldn't let it faster. Instead of this, I showed the progress bar to show how the parsing process done.
Below code is a part of that.
// at first, count the lines of XML file
NSError *error = nil;
NSString *xmlFileString = [NSString stringWithContentsOfURL:url
encoding:NSUTF8StringEncoding
error:&error];
_totalLines = [xmlFileString componentsSeparatedByString:#"\n"].count;
// do other things...
// delegate method when the parser find new section
- (void)parser:(NSXMLParser *)parser
didStartElement:(NSString *)elementName
namespaceURI:(NSString *)namespaceURI
qualifiedName:(NSString *)qName
attributes:(NSDictionary *)attributeDict
{
// do something ...
// back to main thread to change app appearance
NSOperationQueue *mainQueue = [NSOperationQueue mainQueue];
[mainQueue addOperationWithBlock:^{
// Here is important. Get the line number and update the progress bar.
_progressView.progress = (CGFloat)[parser lineNumber] / (CGFloat)_totalLines;
}];
}
I have sample project in GitHub. You can download and just run it. I wish my code may some help for you.
https://github.com/weed/p120727_XMLParseProgress
I'm not sure what exactly is the problem? is it that the flattenHTML method taking a lot of time to finished? or that it's blocking your app when it's running?
If the last one is your problem and assuming you are doing everything right in flattenHTML and that it really takes a lot of time to finish. The only thing you can do is make sure you are not blocking your main thread while doing this. You can use GCD or NSOperation to achieve this, there is nothing else you can do except letting the user know you are parsing the data now and let him decide if he wants to wait or cancel the operation and do something else.

Cut out a part of a long NSString

In my app I want to show a String that contains news. This string is being loaded just from a free Website, so the plain source code of the website does not contain only my string, its is more os less like this:
Stuff
More Stuff
More HTML Stuff
My String
More HTML Stuff
Final Stuff
And of course i want to cut off all the html stuff that i don't want in my NSString. Since i am going to change the String fron time to time the overall length of the Source code from the website changes. This means that substringFromIndex wont work. Is there any other way to Convert the complete source code to just the String that i need?
There are zillions of ways to manipulate text. I would start with regular expressions. If you give more details about the specifics of your problem, you can get more specific help.
Edit
Thanks for the link to the website. That gives me more to work with. If you will always know the id of the div whose contents you want, you can use NSXMLParser to extract the text of the div. This will set the text of an NSTextField to the contents of the div with id "I3_sys_txt". I did this on the Mac but I believe it will work on the iPhone as well.
-(IBAction)buttonPressed:(id)sender {
captureCharacters = NO;
NSURL *theURL = [NSURL URLWithString:#"http://maxnerios.yolasite.com/"];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:theURL];
[parser setDelegate:self];
[parser parse];
[parser release];
}
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict {
if ([elementName isEqual:#"div"] && [[attributeDict objectForKey:#"id"] isEqual:#"I3_sys_txt"]) {
captureCharacters = YES;
divCharacters = [[NSMutableString alloc] initWithCapacity:500];
}
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
if (captureCharacters) {
//from parser:foundCharacters: docs:
//The parser object may send the delegate several parser:foundCharacters: messages to report the characters of an element.
//Because string may be only part of the total character content for the current element, you should append it to the current
//accumulation of characters until the element changes.
[divCharacters appendString:string];
}
}
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
if (captureCharacters) {
captureCharacters = NO;
[textField setStringValue:divCharacters];
[divCharacters release];
}
}
Here is the NSRegularExpression class that you need to use: NSRegularExpression
Here is a 'beginning' tutorial on how to use the class: Tutorial
Here is a primer on what regular expressions are: Regex Primer
Here is an online regular expression tester: Tester
The tester may not work exactly as NSRegularExpression but it will help you understand regex definitions in general. Regular expressions are a key tool for software developers, a little daunting at first, but they can be used to great effect when searching or manipulating strings.
Although this looks like a lot of work - there is no 'quick answer' to what you are attempting. You say "is there any other way to Convert the complete source code to just the String I need?' - the answer is yes - regular expressions. But you need to define what 'just the String that I need' means, and regular expressions are one important way.

Is there a way to "auto detect" the encoding of a resource when loading it using stringFromContentsOfURL?

Is there a way to "auto detect" the encoding of a resource when loading it using stringFromContentsOfURL? The current (non-depracated) method, + (id)stringWithContentsOfURL:(NSURL *)url encoding:(NSStringEncoding)enc error:(NSError **)error;, wants a URL encoding. I've noticed that getting it wrong does make a difference for what I want to do. Is there a way to check this somehow and always get it right? (Right now I'm using UTF8.)
I'd try this function from the docs
Returns a string created by reading data from a given URL and returns by reference the encoding used to interpret the data.
+ (id)stringWithContentsOfURL:(NSURL *)url usedEncoding:(NSStringEncoding *)enc error:(NSError **)error
this seems to guess the encoding and then returns it to you
What I normally do when converting data (encoding-less string of bytes) to a string is attempt to initialize the string using various different encodings. I would suggest trying the most limiting (charset wise) encodings like ASCII and UTF-8 first, then attempt UTF-16. If none of those are a valid encoding, you should attempt to decode the string using a fallback encoding like NSWindowsCP1252StringEncoding that will almost always work. In order to do this you need to download the page's contents using NSData so that you don't have to re-download for every encoding attempt. Your code might look like this:
NSData * urlData = [NSData dataWithContentsOfURL:aURL];
NSString * theString = [[NSString alloc] initWithData:urlData encoding:NSASCIIStringEncoding];
if (!theString) {
theString = [[NSString alloc] initWithData:urlData encoding:NSUTF8StringEncoding];
}
if (!theString) {
theString = [[NSString alloc] initWithData:urlData encoding:NSUTF16StringEncoding];
}
if (!theString) {
theString = [[NSString alloc] initWithData:urlData NSWindowsCP1252StringEncoding];
}
// ...
// use theString here...
// ...
[theString release];

How to save a text document in Cocoa with specified NSString encoding?

I'm trying to create a simple text editor like Textedit for Mac OS X, but after many hours of research can't figure out how to correctly write my document's data to a file. I'm using the Cocoa framework and my application is document-based. Looking around in the Cocoa API I found a brief tutorial, "Building a text editor in 15 minutes" or something like this, that implements the following method to write the data to a file:
- (NSData *)dataOfType:(NSString *)typeName error:(NSError **)outError {
[textView breakUndoCoalescing];
NSAttributedString *string=[[textView textStorage] copy];
NSData *data;
NSMutableDictionary *dict=[NSDictionary dictionaryWithObject:NSPlainTextDocumentType forKey:NSDocumentTypeDocumentAttribute];
data=[string dataFromRange:NSMakeRange(0,[string length]) documentAttributes:dict error:outError];
return data;
}
This just works fine, but I'd like to let the user choose the text encoding. I guess this method uses an "automatic" encoding, but how can I write the data using a predefined encoding? I tried using the following code:
- (NSData *)dataOfType:(NSString *)typeName error:(NSError **)outError {
[textView breakUndoCoalescing];
NSAttributedString *string=[[textView textStorage] copy];
NSData *data;
NSInteger saveEncoding=[prefs integerForKey:#"saveEncoding"];
// if the saving encoding is set to "automatic"
if (saveEncoding<0) {
NSMutableDictionary *dict=[NSDictionary dictionaryWithObject:NSPlainTextDocumentType forKey:NSDocumentTypeDocumentAttribute];
data=[string dataFromRange:NSMakeRange(0,[string length]) documentAttributes:dict error:outError];
// else use the encoding specified by the user
} else {
NSMutableDictionary *dict=[NSDictionary dictionaryWithObjectsAndKeys:NSPlainTextDocumentType,NSDocumentTypeDocumentAttribute,saveEncoding,NSCharacterEncodingDocumentAttribute,nil];
data=[string dataFromRange:NSMakeRange(0,[string length]) documentAttributes:dict error:outError];
}
return data;
}
saveEncoding is -1 if the user didn't set a specific encoding, otherwise one of the encodings listed in [NSString availableStringEncodings]. But whenever I try to save my document in a different encoding from UTF8, the app crashes. The same happens when I try to encode my document with the following code:
NSString *string=[[textView textStorage] string];
data=[string dataUsingEncoding:saveEncoding];
What am I doing wrong? It would be great if someone knows how Textedit solved this problem.
Perhaps you remember that NSDictionary can only store objects...
NSMutableDictionary *dict = [NSDictionary dictionaryWithObjectsAndKeys:
NSPlainTextDocumentType,
NSDocumentTypeDocumentAttribute,
[NSNumber numberWithInteger:saveEncoding],
NSCharacterEncodingDocumentAttribute,
nil];