Web scraping in Objective C - objective-c

is there any Objective C library for parsing HTML, like python's BeautifulSoup? Thanks

From Apple's part there is NSXMLDocument and NSXMLParser, which support tidied HTML input. (Tree-Based XML Programming Guide)
On iOS (4.3) there's currently no NSXMLDocument available, so you'd have to use either NSXMLParser or libxml2.2.
Some more informations on potential problems with parsing malformed HTML:
What's the best approach for parsing XML/'screen scraping' in iOS? UIWebview or NSXMLParser?
The most reliable solution is to use an off-screen WebView, load the HTML source into it and then access its DOM tree.

The best way I have found is NSXMLParser + libtidy. However, there are many third party libraries are available now which makes parsing easier. (last answer was written in 2011).
Google's Gumbo HTML5 parser is pretty good. It's written in pure C99 and you can use it with Objective C (use a wrapper like this one).
If you want pure Objective C libraries then Ono or hpple are good. HTMLReader is also a good alternative.
If Swift is your thing, you could use NDHpple which is a swift wrapper based on hpple. Or You could use Swift-HTML-Parser. (Bonus: Alamofire is as good as Python Requests and is a joy to use)

Related

How do I convert HTML to Markdown in iOS/OS X? Any way to use pandoc in the app?

I have an HTML string that I'd like to convert to markdown. The best tool I've found to do this is pandoc, which is written in Haskell. How can I get pandoc to run inside a Mac/iOS app? I've heard of compiling Haskell to ARM for incorporation into an iOS project, but I have no idea how to actually get pandoc to compile and work inside an Objective-C app.
Thanks,
Robert
I will attempt to help you on how you use pandoc from objective-c part of your question. Objective-C is a superset of C, or so I have been told, meaning valid C code is valid Objective-C code.
So you question could have been worded how do I call haskell from C, which there is a nice wiki page about.
How you get all of this working on IOS and arm is another ball of yarn and would be more likely to be answered when broken into another question.

How to Read Parse XML in Objective C?

What framework I should use to grab XML from a http and parse that?
Start here:
http://developer.apple.com/library/ios/#documentation/Cocoa/Reference/Foundation/Classes/NSXMLParser_Class/Reference/Reference.html
You can use the initWithContentsOfURL initializer to actually load the document from the HTTP source.
Try touchXML:
TouchXML is a lightweight replacement
for Cocoa's NSXML* cluster of classes.
It is based on the commonly available
Open Source libxml2 library.
Here is a nice tutorial.
Source has moved a bit, can be found here

How to get and parse JSON using objective C?

Is it possible to get and parse JSON using objective C, then manipulate it within the cocoa framework for the iphone/pad? I'm specifically looking to do this for a couple of public APIs out there.
See here: how to do json parsing in iphone
Basically, you should look into the TouchJSON library (with CJSONDeserializer and CJSONSerializer).
Used Json-framework on some previous projects, worked really well.
EDIT: I read your post a bit too fast. I've used it on a Mac app before but not targeting the iphone/ipad. I think it should work but have no background to it. Maybe someone else can confirm?
It's not only possible, it's dirt simple if you use one of the many existing open source projects dedicated to this task. I recommend trying yajl-objc, which offers a streaming parser, but json-framework is a good one too. They're very similar.
I'd stay away from TouchJSON, since it gave me trouble a while back with special characters (line breaks) in strings.
However, I'll join the choir recommending json-framework. Since I switched to that from TouchJSON everything's been running smoothly.
Regarding how to integrate the API in your project, they're equally simple to include and use.
As a side note, I'm just now testing out JSONKit, since it's supposed to be much faster than both TouchJSON and json-framework. However, I can't vouch for its stability yet. The reviews of it are good, though.
If you're developing an application that is only iOS 5.0 or later, you can use NSJSONSerialization.

How do you read/write/update m4u and mp3 file meta data using cocoa/objective c?

Are there some particular library files available on OS/X that are relevant, I am just not sure where to start.
You'd probably want to use the QuickTime for that. There is some sample code that does this. However, it's not the nicest way to access metadata. The newer QTKit Framework somehow still requires you to fall back to the C-based APIs. There is another example from Apple embedding meta data writing into a Objective-C method. This might be the best starting point for you.

parse mail to fetch attachments

how do I parse a mail in Cocoa?
I've read the NSScanner tutorial, but struggled.
Do you know any better way than NSScanner?
Is there any sample code?
My example:
http://pastie.org/private/pordph27stkwkyvrx2tiq
Regards
If you cannot find any cocoa libraries to do the job you need done, you can always use C++ or C libraries for your tasks. e.g. Scaling Web's Parser. Apple has documentation on how to use C++ from Objective C
I use C-Client. It's C only, a bit hard to understand but it gets the job done.
I wouldn't take on writing a MIME parser myself - it's lots of work if you look at the RFCs that come into play.