Add a hyperlink into a PDF document - pdf

I'm currently extending our custom PDF writer to be able to write links to websites. However, I have a problem because I can't find anywhere how to place a link into the PDF.
This is, what prints a text:
BT
70 50 TD
/F1 12 Tf
(visit my website!) Tj
ET
What I need now is to wrap this into a hyperlink so the user gets redirected to my website, when clicking "visit my website!"
Any idea how to do so? I can't use a tool or so - I need to know how to write the right PDF commands to the file, since a lot of documents are generated dynamically using C#. Currently I'm using iTextSharp - but I couldn't find any functionality to write a hyperlink, so I decided to add this functionality.

Here's how the spec side of things: Links are created by having Link Annotations placed on the page. A link annotation is represented by either the Rect key or by a set of quadrilaterals. Let's assume that you're working with rectangles. In order to place the link, you'll need a dictionary like this at a minimum:
<< /Type /Annot /Subtype /Link /Rect [ x1 y1 x2 y2 ] >>
(x1, y1) and (x2, y2) describe the corners of the rectangle where the link's active area lives.
To work with this, this should be an indirect object in the PDF and referenced from your page's Annots array.
If you can create this, you'll get a link on the page that goes nowhere.
To get the link to go somewhere you'll need either a /Dest or an /A entry in the link annot (but not both). /Dest is an older artifact for page-level navigation - you won't use this. Instead, use the /A entry which is an action dictionary. So if you wanted to navigate to the url http://www.google.com, you would make your annotation look like this:
<< /Type /Annot /Subtype /Link /Rect [ x1 y1 x2 y2 ]
/A << /Type /Action /S /URI /URI (http://www.google.com) >>
>>
I can't help you specifically with how to do this in iTextSharp. I don't particularly like the model or abstraction that they use. I write a PDF toolkit for Atalasoft and I'll show you how I would do that in my own toolkit. Again, I'm making no effort to conceal that this is a commercial product and it's what I do for a living. I just want you to see that there are other options available.
// make a document, add a font, get its metrics
PdfGeneratedDocument doc = new PdfGeneratedDocument();
string fontResource = doc.Resources.Fonts.AddFromFontName("Times New Roman");
PdfFontMetrics mets = doc.Resources.Fonts.Get(fontResource).Metrics;
// make a page, place a line of text
PdfGeneratedPage page = doc.Pages.AddPage(PdfDefaultPages.Letter);
PdfTextLine line = new PdfTextLine(fontResource, 12.0, "Visit my web site.",
new PdfPoint(72, 400));
page.DrawingList.Add(line);
// get the bounds of the text we place, make an annotation
PdfBounds bounds = mets.GetTextBounds(12.0, "Visit my web site.");
bounds = new PdfBounds(72, 400, bounds.Width, bounds.Height);
LinkAnnotation annot = new LinkAnnotation(bounds, new PdfURIAction(new URI("my url")));
page.Annotations.Add(annot);
// save the content
doc.Save("finaldoc.pdf");
The only thing that is "tricky" is that there is a disassociation between what content is on the page and the link annotation - but this is because this is how Acrobat models links. If you were modifying an existing document, you would construct a PdfGeneratedDocument from the existing file/stream, add the annotation and then save.

Currently I'm using iTextSharp - but I couldn't find any functionality to write a hyperlink.
Have a look at the iText in Action, 2nd Edition Webified iTextSharp Examples MovieLinks2.cs or LinkActions.cs:
// create an external link
Chunk imdb = new Chunk("Internet Movie Database", FilmFonts.ITALIC);
imdb.SetAnchor(new Uri("http://www.imdb.com/"));
p = new Paragraph("Click on a country, and you'll get a list of movies, containing links to the ");
p.Add(imdb);
p.Add(".");
document.Add(p);
Thus, it really is simple to add links using iTextSharp.
If you still want to do that manually, have a look at the PDF specification. Section 12.5.6.5 explains Link Annotations, and section 12.6.4.7 shows the URI Actions to use in that link annotation.

Related

Filter out anything but interactive form fields in PDF's

I'm looking for a way to filter out all objects apart from interactive form fields in PDF files.
The programming language isn't too important, but it would would love if I could do it from the Linux command line but I'm pretty much open to anything.
E.g. choose an pdf input file, and output a new pdf file with only the interactive form fields from the first.
The ultimate goal is to be able to take an already printed but unfilled form , and print only the content of the filled in form fields onto it.
The closest I've gotten is by using ghostscript:
gs -o outfile.pdf -sDEVICE=pdfwrite -dFILTERTEXT -dFILTERIMAGE infile.pdf
But that still leaves a lot of lines in my case, as well as an image despite -dFILTERIMAGE.
There's also a -dFILTERVECTOR-option but sadly it removes the formfields as well.
I'm looking for a way to filter out all objects apart from interactive form fields in PDF files.
First and foremost you have to get rid of the static page content. Using an arbitrary general purpose pdf library you can do that by clearing the contents entry of every page.
E.g. using the Java version of iText7 this can be done as follows:
try (
PdfReader pdfReader = new PdfReader(SOURCE);
PdfWriter pdfWriter = new PdfWriter(RESULT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter)
) {
for (int pageNr = 1; pageNr <= pdfDocument.getNumberOfPages(); pageNr++) {
PdfPage pdfPage = pdfDocument.getPage(pageNr);
pdfPage.getPdfObject().remove(PdfName.Contents);
pdfPage.getPdfObject().setModified();
}
}
(RemoveContent test testRemoveAllPageContentStreams)

PDF metadata to open document in Actual Size (100%) view

I am generating a PDF document using jsPDF. Is there a way to store metadata in the PDF document that will force Acrobat to open it in 100% view mode (Actual Size) vs sized to fit?
In other words does PDF document specification allow that to specify it in the document itself?
This is definitely possible, because a PDF document can contain information on how it should open.
You might create such a document in Acrobat and then find the opening information, and/or you might have a look at the Portable Document Format Reference, which is part of the Acrobat SDK, downloadable from the Adobe website.
However, I don't know whether you can insert that structure into the PDF with your tool.
I figured it out; in the Catalog section of the PDF document, there is a OpenAction section where we can specify how the view can show the file, among other things.
I changed this
putCatalog = function () {
out('/Type /Catalog');
out('/Pages 1 0 R');
// #TODO: Add zoom and layout modes
out('/OpenAction [3 0 R /FitH null]');
out('/PageLayout /OneColumn');
events.publish('putCatalog');
},
to this
putCatalog = function () {
out('/Type /Catalog');
out('/Pages 1 0 R');
// #TODO: Add zoom and layout modes
out('/OpenAction [3 0 R 1 100]'); //change from standard code to use zoom to 100 % instead of fit to width
out('/PageLayout /OneColumn');
events.publish('putCatalog');
},

How to use PDDestination class in PDFbox?

How to use PDDestination class in PDFbox ?
whether getPagenumber() method will return the current page number
can any one share u r views
Thanks
The usage of PDDestination or PDAction is very similar to the one of PdfDestination or PdfAction of iText.
So you may want to search iText examples firstly.
Specifically on PDFBox,
e.g.
the following makes the first open page to page 5.
PDDestination dest = new PDPageDestination();
// When you open this PDF, you will see page 5.
dest.setPageNumber(4)
PDActionGoTo action = new PDActionGoTo();
action.setDestination(dest);
document.getDocumentCatalog().setOpenAction(action);

How to find x,y location of a text in pdf

Is there any tool to find the X-Y location on a text content in a pdf file ?
Docotic.Pdf Library can do it. See C# sample below:
using (PdfDocument doc = new PdfDocument("your_pdf.pdf"))
{
foreach (PdfTextData textData in doc.Pages[0].Canvas.GetTextData())
Console.WriteLine(textData.Position + " " + textData.Text);
}
Try running "Preflight..." in Acrobat and choosing PDF Analysis -> List page objects, grouped by type of object.
If you locate the text objects within the results list, you will notice there is a position value (in points) within the Text Properties -> * Font section.
TET, the Text Extraction Toolkit from the pdflib family of products can do that. TET has a commandline interface, and it's the most powerful of all text extraction tools I'm aware of. (It can even handle ligatures...)
Geometry
TET provides precise metrics for the text, such as the position on the page, glyph widths, and text direction. Specific areas on the page can be excluded or included in the text extraction, e.g. to ignore headers and footers or margins.

Is there a way to add "alt text" to links in PDFs in Adobe Acrobat?

In Adobe Acrobat Pro, it's not that difficult to add links to a page, but I'm wondering if there's also a way to add "alt text" (sometimes called "title text") to links as well. In HTML, this would be done as such:
link
Then when the mouse is hovering over the link, the text appears as a little tooltip. Is there an equivalent for PDFs? And if so, how do you add it?
Actually PDF does support alternate text. It's part of Logical Structure documented PDF Reference 1.7 section 10.8.2 "Alternate Descriptions"
/Span << /Lang (en-us) /Alt (six-point star) >> BDC (✡) Tj EMC
In PDF syntax, Link annotations support a Contents entry to serve as an alternate description:
/Annots [
<<
/Type /Annot
/Subtype /Link
/Border [1 1 1]
/Dest [4 0 R /XYZ 0 0 null]
/Rect [ 50 50 80 60 ]
/Contents (Link)
>>
]
Quoting "PDF Reference - 6th edition - Adobe® Portable Document Format - Version 1.7 - November 2006" :
Contents text string (Optional) Text to be displayed for the annotation or, if this type of annotation does not display text, an alternate description of the annotation’s contents in human-readable form. In either case, this text is useful when extracting the document’s contents in support of accessibility to users with disabilities or for other purposes
And later on:
For all other annotation types (Link , Movie , Widget , PrinterMark , and TrapNet), the Contents entry provides an alternate representation of the annotation’s contents in human-readable form
This is displayed well with Sumatra PDF v3.1.2, when a border is present:
However this is not widely supported by other PDF readers.
The W3C, in its PDF Techniques for WCAG 2.0 recommend another ways to display alternative texts on links for accessibility purposes:
PDF11: Providing links and link text using the Link annotation and the /Link structure element in PDF documents
PDF13: Providing replacement text using the /Alt entry for links in PDF documents
No, it's not possible to add alt text to a link in a PDF. There's no provision for this in the PDF reference.
On a related note, links in PDFs and links in HTML documents are handled quite differently. A link in a PDF is actually a type of annotation, which sits on top of the page at specified co-ordinates, and has no technical relationship to the text or image in the document. Where as links in HTML documents bare a direct relationpship to the elements which they hyperlink.
Alt text, or alternate text, is not the same as title text. Title text is meta data intended for human consumption. Alt text is alternate text content upon media in case the media fails to load.
There is also a trick using an invisible form button that doesn't do anything but allows a small popup tooltip text to be added when the mouse hovers over it.
Officially, per PDF 1.7 as defined in ISO 32000-1 14.9.3 (see Adobe website for a free download of a document that is equivaent to the ISO standard for PDF 1.7), one would provide alternate text for an annotation - like for example a Link annotation - by adding a key "Contents" to its data structure and provide the alt text as a text string value for that key.
Unfortunately Acrobat does not seem to provide a user interface to add or edit this "Contents" text string for Link annotations, and even if it is present it will not be used for the tool tip. Instead the tool tip always seems to be the target of the Link annotation, at least if it points to a URL.
So on a visual level one could hack around this by adding some other invisible elements on top of the area of the Link annotation that have the desired behavior. Not a very nice hack, at least for my taste. In addition it does not help with accessibility of the PDF, as it introduces yet another stray object...
Facing the same problem I used the JS lib "pdf-lib" (https://pdf-lib.js.org/docs/api/classes/pdfdocument) to edit the content of the pdf file and add the missing attributes on annotations.
const pdfLib = require('pdf-lib');
const fs = require('fs');
function getNewMap(pdfDoc, str){
return pdfDoc.context.obj(
{
Alt: new pdfLib.PDFString(str),
Contents: new pdfLib.PDFString(str)
}).dict;
}
const pdfData = await fs.readFile('your-pdf-document.pdf');
const pdfDoc = await pdfLib.PDFDocument.load(pdfData);
pdfDoc.context.enumerateIndirectObjects().forEach(_o => {
const pdfRef = _o[0];
const pdfObject = _o[1];
if (typeof pdfObject?.lookup === "function"){
if (pdfObject.lookup(pdfLib.PDFName.of('Type'))?.encodedName === "/Annot"){
// We use the link URI to implement annotation "Alt" & "Contents" attributes
const annotLinkUri = pdfObject.lookup(pdfLib.PDFName.of('A')).get(pdfLib.PDFName.of('URI')).value;
const titleFromUri = annotLinkUri.replace(/(http:|)(^|\/\/)(.*?\/)/g, "").replace(/\//g, "").trim();
// We build the new map with "Alt" and "Contents" attributes and assign it to the "Annot" object dictionnary
const newMap = getNewMap(pdfDoc, titleFromUri);
const newdict = new Map([...pdfObject.dict, ...newMap]);
pdfObject.dict = newdict;
}
}
})
// We save the file
const pdfBytes = await pdfDoc.save();
await fs.promises.writeFile("your-pdf-document-accessible.pdf", pdfBytes);