Checking the contains of an embed tag using Selenium - selenium

We generate a pdf doc via a call to a web service that returns the path to the generated doc.
We use an embed html tag to display the pdf inline, i.e.
<div id="ctl00_ContentPlaceHolder2_ctl01_embedArea">
<embed wmode="transparent" src="http://www.company.com/vdir/folder/Pdfs/file.pdf" width="710" height="400"/>
I'd like to use selenium to check that the pdf is actually being displayed and if possible save the path, i.e. the src link into a variable.
Anyone know how to do this? Ideally we'd like to be able to then compare this pdf to a reference one but that's a question for another day.

As far as inspecting the pdf from selenium, you're more or less out of luck. The embed tag just drops a plugin into the page, and because a plugin isn't well represented in the DOM, Selenium can't get a very good handle on it.
However, if you're using Selenium-RC you may want to consider getting the src of the embed element, then requesting that URL directly and evaluating the resulting PDF in code. Assuming your embed element looks like this <embed id="embedded" src="http://example.com/static/pdf123.pdf" /> you can try something like this
String pdfSrc = selenium.getAttribute("embedded#src");
Then make a web request to the pdfSrc url and do (somehow) validate it's the one you want. It may be enough to just check that it's not a 404.

Related

How do I render a PDF from HTML with working named anchors?

Is there a way for a bunch of named anchors in a large html to be clickable within a PhantomJs generated PDF file?
I.e. say I have a table of contents or a list of FAQ questions. When clicking on the question/title - I'm taken to its answer/content within the same HTML file which is great but when the same HTML is rendered into a PDF each named anchor becomes an absolute URL (i.e. http://example.com/render.html#anchor_1) so clicking on it opens a browser with that URL instead of jumping to its content within the PDF file.
So, basically, is it possible (and how?) for a markup like this - https://fiddle.jshell.net/jyjuaaog/ to work within the generated PDF?
BTW, this works great when "printing as a PDF file" in Google Chrome but links end up broken when rendered in PhantomJs so there must be something I'm missing that I can't seem to find in the docs.
Any ideas?
Thanks!
Apparently there's a bug in PhantomJs preventing this. As suggested by PhantomJsCloud a quick-and-dirty workaround would be to replace the links with page links.

How to add a custom image (<xh:img>) to PDF

We would like to add an image to our PDF in Orbeon. We explorered different tags and came up with tag. This worked the way we wanted but this tag keeps the PDF from building. We don't get any (visible) errors but a time-out occurs after couple of seconds.
To cross check: PDF build fine without the xh:img tag.
I was wondering what other options do we have. I thought about a PDF template but we would like to give the form author the option to choose his/hers own jpg from a web resource.
This is on 43PE.
User error yet we didn't change much after all.

How to generate screenshot from stored HTML as a string?

Is it possible to generate a screenshot from a saved HTML string?
(more specifically, I mean very simple pages)
I know there are a lot of webkit based libraries around that generate screenshots but they all seem to expect a URL. What if the HTML content is stored or in a variable?
I'd prefer Python but I'm open to any other solutions.
yes, in java there is a awt control that can do this. you can pass the content of html string and it can generate screenshot.
this is one of a similiar type:
http://code.google.com/p/java-html2image/

how to read/parse dynamically generated web content?

I need to find a way to write a program (in any language) that will connect to a website and read dynamically generated data from the website.
Note that it's dynamically generated--it's not enough to get the source html, because the data I'm interested in is generated via javascript that references back-end code. So when i view the webpage source, I can't see the data. (For example, go to google, and do a search. Check the source code on the search results page. Very little of the data your browser is displaying is reflected in the source--most of it is dynamically generated. I need some way to access this data.)
Pick a language and environment that includes an HTML renderer (e.g. .NET and the WebBrowser control). Use the HTML renderer to get the URL and produce an HTML DOM in memory (making sure that scripting is enabled). Read the contents of the HTML DOM after the renderer has done its work.
Example (you'll need to do this inside a System.Windows.Form derived class):
WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");
HtmlDocument document = browser.Document;
// extract what you want from the document
I used to have a Perl program to access Mapguide.com to get the drive direction from one location to another location. I parsed the returned page and save to database. If the source never change their format, it is OK. the problem is the source format often change, your parser also need change.
A simple thought: if we're talking about AJAX, you can rather look up the urls for the dynamic data. Then you can use the javascript on the page you're talking about to reformat this.
If you have Firefox/greasemonkey making a DOM dumper should be a simple matter.

How to download file from inside Seam PDF

In out project we are creating a pdf by using seam pdf and storing that pdf in the database.
The user can then search up the pdf and view it in their pdf viewer. This is a small portion of the code that is generated to pdf:
<p:html>
<a:repeat var="file" value="#{attachment.files}" rowKeyVar="row">
<s:link action="#{fileHandler.downloadById()}" value="#{file.name}" >
<f:param name="fileId" value="#{file.id}"/>
</s:link>
</a:repeat>
When the pdf is rendered, a link is generated that points to:
/project/skjenkebevilling/status/status_pdf.seam?fileId=42&actionMethod=skjenkebevilling%2Fstatus%2Fstatus_pdf.xhtml%3AfileHandler.downloadById()&cid=16
As you can see this link doesnt say much, and the servletpath seems to be missing.
If I change /project with the servletpath
localhost:8080/saksapp/skjenkebevilling/status/status_pdf.seam?fileId=42&actionMethod=skjenkebevilling%2Fstatus%2Fstatus_pdf.xhtml%3AfileHandler.downloadById%28%29&cid=16
Than the download file dialog appears. So my question is, does anyone know how I can input the correct link? And why this s:link doesnt seem to work?
If I cannot do that, then I will need to somehow do search replace and edit the pdf, but that seems like a bit of a hack.
(This is running under JBoss)
Thank you for your time....
I found a workaround for this problem.
Seems I have to use s:link together with a normal a href tag.
Only having href tag doesn't work for some reason.
<s:link action="#{fileHandler.downloadById()}" value="#{file.name}" propagation="none">
<f:param name="fileId" value="#{file.id}"/>
</s:link>
<a href="#{servletPath.path}?fileId=#{file.id}&actionMethod=#{path.replace('/','')}%2Fstatus%2Fstatus_pdf.xhtml%3AfileHandler.downloadById()&">
download
</a>
The servletPath.path returns the servlet path ie http://mydomain.com/download.seam
You can decide to put login-required=true on the download.seam if you want users to login before downloading a file.
#Observer("org.jboss.seam.security.loginSuccessful")
public void servletPath() {
HttpServletRequest request = (HttpServletRequest) FacesContext.getCurrentInstance().getExternalContext().getRequest();
this.path = request.getRequestURL().toString().replace("login.seam", "download.seam");
}