Objective C get html page's links - objective-c

I'm quite new in Objective C programming and I'm trying to make an application that returns all the link addresses in HTML page. In that case i shouldn't just parse the HTML, but get these links intercepting them from the page's network request.
Is it possible to intercept the application's network requests or something?
Thanks

Coincidentally, Ray Wenderlich's rather AWESOME iOS tutorial site posted this article in the last hour. As you are new to iOS/ObjC, I highly recommend reading it thoroughly.
Let’s say you want to find some information inside a web page and
display it in a custom way in your app.
This technique is called
“scraping.” Let’s also assume you’ve thought through alternatives to
scraping web pages from inside your app, and are pretty sure that’s
what you want to do.
Well then you get to the question – how can you
programmatically dig through the HTML and find the part you’re looking
for, in the most robust way possible? Believe it or not, regular
expressions won’t cut it!
And before you think Regular Expressions might really be an answer, please read this.

Related

I don't understand the Dojo documentation

I'm a beginner in dojo. First of all is everything javascript based? For example to create a form I have to use JavaScript or HTML tags?
Also I cannot understand their documentation and tutorials. It's very confusing.
Is there a proper website (other then dojo itself) that has good tutorials?
You can use Dojo's components's (widgets) in two ways. Programmatic and declaritive. The programmatic way (what you are talking about) is by defining widgets through the use of javascript. With declaritive you can define them using HTML markup. David Walsh has a nice short writeup and if you search for "declaritive programmatic dojo" you'll find some questions and answers on the matter:
https://davidwalsh.name/dojo-widget
Difference between programmatically vs declaratively created widgets in dojo?
Declarative coding or programmatic coding in Dojo Projects?
Declarative or programatic approach in DOJO?
If you're having trouble with the tutorials on the Dojo website, i suspect you're better off, first diving into some basic beginner javascript tutorials before trying to learn a framework like Dojo. I concur with the comment Ferry made on your question, there are no better resources than the actual Dojo website. I recommend following every tutorial, starting with the Hello Dojo tutorial and working your way up so that you don't miss out on the basic concepts which you'll need when you read the harder tutorials. Good luck!
For your first question: dojo is javascript-based platform that provide you with a basic javascript library and a bunch of basic widgets (UI controls like button, dialogue, layouts,...), and some extra things. However, you don't really have to use dojo all the time: you still can use dojo to manipulate a html form button; it's just dojo button comes with extra functionalities and might save you some time.
For the second question, I agree with iH8 that dojo website is the best place to start. There are three different ways dojo websites can help you:
Look at the tutorial: Basic steps on how to set it up and use provided functionalities as-is
Look at thetoolkit api: This provides a very detailed view of dojo javascript object (See what extra things you can do with dojo objects)
Look at the nightlytest: I found this very helpful in term of showing me what can be done outside of the tutorial (i.e. how to use things you found in the api)
Other than these, you can look at existing implementation to learn about the toolkit.
Basically, this is how I am learning Dojo. Without more-specific questions, it's hard to tell what is confusing about the tutorial. I would recommend you give it a try and post a question: everyone here will be willing to help you.
I recommend starting with some video tutorial like this.
When you understand the concept, you can try to copy and paste some code from Dojo documentation tutorials or Reference Guide, because all books are out-of-date.
Also you may find some useful information on IBM-related sites like http://www.ibm.com/developerworks because IBM invested in Dojo and uses it for its products.
If you have enough resources ($) you can take participation in Workshops (sitepen.com/workshops)

How does Safari's reader feature work?

I want to add a similar feature to a tool I'm making. I'm interested in how it works code-wise. I want to be able get an html page and exclude all but the article.
The Readability project does something similar for chrome and iOS. I'm not sure how it detects the content automatically but I know that Readability has an API for people who want to integrate it's features. You might want to check that out.
http://www.readability.com/learn-more
If you're working with Ruby, you could use Pismo. It extracts an article from a given document.

Creating a proxy for IE (or other browsers)

I'm using VB.net. I need to create an application that sits in between the browser and the actual internet. Basically, I'm creating an online game that will edit some webpages that are incoming so that they contain parts of the game (it's a kind of scavenger hunt). How would I create this?
Does anybody have any ideas for this? I've found nothing online. If you do know something about this, I prefer code examples and not just subjects. I tend to need big pushes in a direction to learn something new.
Thanks if you can!
Your best bet is to start with FiddlerCore, which is a .NET Class Library which provides exactly what you're looking for. http://fiddler.wikidot.com/fiddlercore

Dynamic web page convertible to PDF

I'm thinking about writing a professional CV page that would be easy to update, using a simple backend to add informations and blocks of optional details, and... (feature creap coming)
Anyway, I was thinking of a "simple" web page grpahically, that would easily be convertible to PDF file, using browser functionallity or not.
Assuming that the page have blocks of text that you must ckick a button to see (those are optional details), what should I know or what tools should I use to write this web page?
I'm totally rusted on web code, I used php without ajax a lot before but I understand the idea. I was thinking maybe it would be a good opportunity to try a framework to make a "webapp", like Ruby+ROR or Python+Django? Is that a good idea? I'm ready to learn about those, I'm just not sure if it's worth for such project.
Should I know some things about html code or javascript behaviour that I shouldn't use because it would break any PDF generation tool or something like that?
Any advice on the way to proceed would be helpful.
You'll want to read up on how to create a print stylesheet. This way when you go to print the CV you can choose something like CutePDF Writer and your print stylesheet will automatically be used. You will make your stylesheet show all hidden text blocks and hide things like navigation, buttons, etc.
I can't tell you whether or not it's worth it for you to try a new framework for this project, that's up to you. It's not bad to learn new things. Since I don't know all the details of your project it's hard to answer if it's worth it for this particular project. From your description is sounds like you're just making an HTML resume/CV which sound, to me, like one flat HTML page with some JavaScript. If that's the case you could probably just use a text editor.
If you want my personal opinion, ASP.Net 4 is the way to go if you want to learn something new (or if you just want to use a great framework).
As far as breaking the PDF generation, your print stylesheet will be responsible for showing/hiding things but any JavaScripts should be aware of this as well. Check the link I gave you above for more information.

Script or piece of code to get a quick list of links per page in a website

How can I quickly produce a report of a website in the format:
Page Name.
- Links within the page
Page Name.
- Links within the page
Any programming or scripting language will do.
Although I prefer a solution on Windows, we have all of: Windows, Mac and Linux platforms available in the office.
Just looking for a way to do it without much fanfare.
There might be tools able to do this for you, but it isn't all that hard to put together yourself. One possible solution would be to...
Use wget (can be found for Windows) to download all HTML files, and
use some xpath tool or grep with regexps to get the title and the links from the pages.
///Jens
There are loads of link analysers that will do exactly that. Here's the first I found in Google.
For something a little more interesting, Don Syme did a great F# demo in which he wrote a really simple asynch URL processing class. I can't find the exact link, but here's something similar from an F# MVP. You would need to adapt it to pull out links, and recursively follow them if you want nesting.