How do I render a PDF from HTML with working named anchors? - pdf

Is there a way for a bunch of named anchors in a large html to be clickable within a PhantomJs generated PDF file?
I.e. say I have a table of contents or a list of FAQ questions. When clicking on the question/title - I'm taken to its answer/content within the same HTML file which is great but when the same HTML is rendered into a PDF each named anchor becomes an absolute URL (i.e. http://example.com/render.html#anchor_1) so clicking on it opens a browser with that URL instead of jumping to its content within the PDF file.
So, basically, is it possible (and how?) for a markup like this - https://fiddle.jshell.net/jyjuaaog/ to work within the generated PDF?
BTW, this works great when "printing as a PDF file" in Google Chrome but links end up broken when rendered in PhantomJs so there must be something I'm missing that I can't seem to find in the docs.
Any ideas?
Thanks!

Apparently there's a bug in PhantomJs preventing this. As suggested by PhantomJsCloud a quick-and-dirty workaround would be to replace the links with page links.

Related

Is there any way to archive and recover entire page (with entire html, css, img, js, ...) using selenium chromedriver on ubuntu?

I'm looking for a way to archive the entire state of webpage, for the purpose of archiving the webpage.
Actually, what I want to somehow save all rendered results of the page (not as the form of screenshot, but the form of rendered result of DOM element) that we can see on browser, and recover them in local environment without network.
I really don't need to save all the functionalities of the page that interact with other computer. Only the view of the page is needed to be archived.
What I tried to archive youtube.com's home page were,
Using beautiful soup to get immediate html sources
Using python selenium and chromedriver to get dynamically loaded html sources
2 + downloading all referenced .css, .js, and images from links in html codes to local directory.
Pressing ctrl+s on chrome, which downloads html sources and several files. (.js, .css, .jpg, ...)
But all of them did not work correctly.
At first, 4th method seems working, but soon I found out that it downloads initial html source, not a dynamically loaded one.
Is there any known ways to do this kind of stuffs? (archiving currently rendered state of the page)
Thanks in advance.

How to add a custom image (<xh:img>) to PDF

We would like to add an image to our PDF in Orbeon. We explorered different tags and came up with tag. This worked the way we wanted but this tag keeps the PDF from building. We don't get any (visible) errors but a time-out occurs after couple of seconds.
To cross check: PDF build fine without the xh:img tag.
I was wondering what other options do we have. I thought about a PDF template but we would like to give the form author the option to choose his/hers own jpg from a web resource.
This is on 43PE.
User error yet we didn't change much after all.

Embedd scrollable document faster than PDF possible?

I have a page of about 10 embedded PDF docs. My question is there another option that would use smaller file sizes or something so they don't bogg down the page when visited? Even convert PDF to something else if possible. Right now they are all in an accordion jquery and run with Scribd. Maybe something similar to this would work.:
scrolling text box
If you want the jqueryui to download different data for each tab you need to put an empty div inside each section and then set an on open (I am not familiar with jqueryui but maybe the activate event?) and create an ajax call to get the relevant pdf and save it to the empty div.
There are lots of questions about ajax in jqui accordion here here and here for example.

Disable browser cache for displayed in an iFrame PDF by means of TCPDF

I am trying for hours to solve the following caching problem.
My application has the following structure (simplified):
index.php - main page (contains various input fields, submit button and an iframe for dispaying PDF content with the help of TCPDF)
generate.php - generates PDF file based on the supplied POST parameters and stores the file to the filesystem
viewer.php - Displays the PDF document (TCPDF libraries). The iframe loads this script to show the pdf file
The workflow is pretty simple - the user chooses some options and clicks the submit button on the main page. The selected parameters are sent per AJAX by POST to the generate.php script. The script generates the PDF file and stores it to the filesystem. At the end it returns the newly created/edited filename. The filename is fetched in the AJAX callback function, which then refreshes the iframe with the new/edited filename:
viewer.php?filename=NEW_OR_EDITED_FILENAME
Everything is working, but when the file is being replaced, sometimes (NOT ALWAYS), the browser shows the old pdf file, although the new version is on the hard drive. I tried the following solutions:
Add Meta tags to disable cache to the generated HTML by index.php and viewer.php
Disabling cache for jQuery AJAX calls by: jQuery.ajaxSetup({cache: false});
Adding some random string to the the filename parameter:
viewer.php?filename=FILENAME_RANDOMSTRING
The RANDOMSTRING is then removed from the script and the filename is extracted.
None of these solutions worked for me. Tested browsers are: Chrome 25.0.1364.152 and Firefox 19.0. Can someone help me with this?
Thanks in advance
Just had the same problem but after adding a random string it works perfect:
<iframe src="file.pdf?=<?=time();?>"></iframe>
After many hours of trying, the solution I found is to really generate a new file each time (Solution 3 from the question without removing the random string at the end of the file). As a result it was necessary to update the database and to delete the old files on every change. My initial intention was to avoid these actions, but unfortunately no other solution was found

Checking the contains of an embed tag using Selenium

We generate a pdf doc via a call to a web service that returns the path to the generated doc.
We use an embed html tag to display the pdf inline, i.e.
<div id="ctl00_ContentPlaceHolder2_ctl01_embedArea">
<embed wmode="transparent" src="http://www.company.com/vdir/folder/Pdfs/file.pdf" width="710" height="400"/>
I'd like to use selenium to check that the pdf is actually being displayed and if possible save the path, i.e. the src link into a variable.
Anyone know how to do this? Ideally we'd like to be able to then compare this pdf to a reference one but that's a question for another day.
As far as inspecting the pdf from selenium, you're more or less out of luck. The embed tag just drops a plugin into the page, and because a plugin isn't well represented in the DOM, Selenium can't get a very good handle on it.
However, if you're using Selenium-RC you may want to consider getting the src of the embed element, then requesting that URL directly and evaluating the resulting PDF in code. Assuming your embed element looks like this <embed id="embedded" src="http://example.com/static/pdf123.pdf" /> you can try something like this
String pdfSrc = selenium.getAttribute("embedded#src");
Then make a web request to the pdfSrc url and do (somehow) validate it's the one you want. It may be enough to just check that it's not a 404.