parsing html data iphone xpath - objective-c

have this webpage http://www.westminster.ac.uk/schools/computing/undergraduate . I'm using hpple to retrieve data (just started learning about it). I want to specifically retrieve the href from he main page, how can i do this?
I have this line - "NSArray *elements = [xpathParser search:#"//a"];" is able to retrieve all of the href links within the page however how can i retrieve just the ones in the main content? e.g. "BSc Honors Busniess Information Systems"? whats the syntax for it?

It looks like all of the "main content" stuff is found underneath elements with id attributes like "content_div_XXXX" where XXXX is some randomly generated sequence. You might be able to get at what you want using an XPath that looks something like:
//div[starts-with(#id,'content_div')]//a
You should be able to get something like this working, although you'd have to try it out and perhaps tweak it a bit to make it work precisely as you want. Refer to W3Schools XPath page for a good set of XPath tutorials

Related

Display an internal link URL on a page/article with Pelican

The usual syntax to make links with Pelican is:
This is [a link]({filename}/foo.md)
That works just fine.
But I'm on a page where I'd like to actually show the URL of the link. That is, I want the generated HTML to be like this:
<p>Here is the link:</p>
https://example.com/foo.html
I tried writing the obvious:
[{filename}/foo.md]({filename}/foo.md)
But that got rendered as:
{filename}/foo.md
I couldn't find anything in the documentation, is there any way to do that?
I don't think the feature in question was designed to behave that way. If it were me, I would use:
[https://example.com/foo.html]({filename}/foo.md)

Selenium: How to locate all the images on a webpage without knowing their id or name attribute?

Let us say, I loaded a URI with selenium. I have no idea how the elements are named in that page (I do not know the id, name ... of elements). I want to download all the possible pictures that may exist on that webpage. This problem is solved through the first answer of this question. But how can I located with selenium all the pictures that exist on that webpage ?
I checked answers for similar questions like this one but the answers are not useful.
The easiest way is to find by tag name as all images will have an image tag you can just get every element on that page with that tag. In python i believe it will be using (note note find_element_by_tag_name as that would return just one of the elements)
find_elements_by_tag_name
and you will want to find elements with the img tag http://www.w3schools.com/tags/tag_img.asp
If I'm not mistaken, you can do something like this
driver.find_element_by_tag_name('img')
Hope that's help.

Get text from a section on some page

I know how to make an API call to get me the text of the whole page, like this, but is there a way (without having to parse through the wiki markup) to only get the text from a certain section?
If you look at the documentation for the revisions module, you'll notice that it has a prameter rvsection, which is exactly what you want. So, for example, to retrieve the lead section, use
http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=Stack%20Overflow&prop=revisions&rvprop=content&rvsection=0

Google+ : Multiple +1 on same page, different content

I've tried to find an answer to this (both in the dev docs and here), but with no luck.
The "+1 button" works fine on normal pages (where there's just the single +1). But I have a page with multiple entities (to use the terms of Drupal: A View displaying multiple nodes) where I'd like to add "share buttons". So far I've added Twitter and Facebook.
Twitter is the simplest as it just takes the string you give it..
Facebook takes an url, but you can specify your own url.
When I try to specify my own url for +1 I get this Error:
Unsafe JavaScript attempt to access frame with URL http://one80.seasites.se/whats-up from frame with URL https://plusone.google.com/_/+1/hover?hl=sv&url=http%3A%2F%2Fone80.seasites.se%2Fwhats-up%2Fl%25C3%25B6rdag&t=1342724634133&source=widget&isSet=false&referer=http%3A%2F%2Fone80.seasites.se%2Fwhats-up&jsh=m%3B%2F_%2Fapps-static%2F_%2Fjs%2Fgapi%2F__features__%2Frt%3Dj%2Fver%3Dr4LFRxx-_oY.sv.%2Fsv%3D1%2Fam%3D!ZCfx2q5v6YmYvWjcTQ%2Fd%3D1%2Frs%3DAItRSTNI50TT3SY8R9klRLc_1sBJ5_Rp3g#id=I3_1342724634541&parent=http%3A%2F%2Fone80.seasites.se&rpctoken=619983104&_methods=mouseEvent%2CtrackingEvent%2ConVisibilityChanged%2C_onopen%2C_ready%2C_onclose%2CcloseOrHideThisBubble%2C_close%2C_open%2C_resizeMe%2C_renderstart. Domains, protocols and ports must match.
rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:173
ec.a.v rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:173
xh rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:203
q.get rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:211
ec.w rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:173
Rh rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:208
q.w rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:220
Rb rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:30
Xg rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:187
(anonymous function) rs=AItRSTOQ10u7fGwgD-LqzsOa-fsgdlhDCg:226
To explain why I want to use separate URL:
every node is something like an event, every node has it's own url (which contains an image and text/info). So when you click Like (for FB) it gets the title, info & image and includes it in the post (So it says "What's up - Gathering", instead of a generic "What's up" and no/the same image).
I'd like to accomplish the same with G+.
Is there a way to accomplish this for G+?? Have I missed something??
I guess one way to do this is by using an iframe for each of the nodes and pull in a special version of the "node page" with just the g+-button. But that's a pretty nasty hack (and not that fun to set up).
Any ideas are welcome!
The error you're seeing is actually due to an issue in Chrome. The +1 button should automatically recover.
You can explicitly specify target pages by using the href attribute. Your markup will look like this in practice:
<g:plusone href="http://example.com/targeturl"></g:plusone>
Or like this with HTML5 syntax:
<div class="g-plusone" data-href="http://example.com/targeturl"></div>
If these don't work, can you share a link to a page where you're seeing it not work? I can take a look :)

Checking the contains of an embed tag using Selenium

We generate a pdf doc via a call to a web service that returns the path to the generated doc.
We use an embed html tag to display the pdf inline, i.e.
<div id="ctl00_ContentPlaceHolder2_ctl01_embedArea">
<embed wmode="transparent" src="http://www.company.com/vdir/folder/Pdfs/file.pdf" width="710" height="400"/>
I'd like to use selenium to check that the pdf is actually being displayed and if possible save the path, i.e. the src link into a variable.
Anyone know how to do this? Ideally we'd like to be able to then compare this pdf to a reference one but that's a question for another day.
As far as inspecting the pdf from selenium, you're more or less out of luck. The embed tag just drops a plugin into the page, and because a plugin isn't well represented in the DOM, Selenium can't get a very good handle on it.
However, if you're using Selenium-RC you may want to consider getting the src of the embed element, then requesting that URL directly and evaluating the resulting PDF in code. Assuming your embed element looks like this <embed id="embedded" src="http://example.com/static/pdf123.pdf" /> you can try something like this
String pdfSrc = selenium.getAttribute("embedded#src");
Then make a web request to the pdfSrc url and do (somehow) validate it's the one you want. It may be enough to just check that it's not a 404.