Possible to put HTML annotations on PDF? - pdf

I know that we can now put text, links and videos..but can we put HTML as annotation as well?
If there's a SDK, please point me to it as well.
I have tried to search as much as possible but couldn't find anything on it.
Updated: okay, here are more details. I'm creating a script to create a PDF from an image, and at the same time have to place annotations on top of the image. When the person click the annotation, the HTML will be shown. I understand there are link annotations and shape annotation, but what I'm looking for is the ability to place HTML markup/codes in the annotation. For example, i would be able to design a simple form or style some text or even a embed YouTube video.
I hope I'm clear.
Thanks!

Here goes a very basic sample code : Please add Itext jar in your project
Code :
import com.itextpdf.text.Document;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Image;
//input is image in String format
public void createfromimage(String input){
Document document = new Document(PageSize.A4.rotate());
document.setMargins(0,0,0,0);
String output = "C:/Users/username/Downloads/text.pdf";
try {
FileOutputStream fileOutputStream = new FileOutputStream(output);
PdfWriter pdfWriter = PdfWriter.getInstance(document, fileOutputStream);
Image image = Image.getInstance(input);
document.setPageSize(new Rectangle(image.getWidth(),image.getHeight()));
document.open();
pdfWriter.open();
document.add(image);
document.close();
pdfWriter.close();
} catch (Exception e){
e.printStackTrace();
}
}
You can add annotations in above way, and for Link annotations, refer to link below :
https://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/interactive/annotation/PDAnnotationLink.html
please note, this is just a simple example.

Related

Adding an Annotation to a PdfFormXObject so the Annotation is reusable

I'm using iText 7 to construct reusable PDF components that I reuse across multiple pages within a document. I'm using iText-dotnet for this task (v7), using F# as the language. (This shouldn't be hard to follow for non-F# people as it's just iText calls :D)
I know how to add annotations to a Page, that isn't the issue. Adding the annotation to the page is as simple as page.AddAnnotation(newAnnotation).
Where I'm having difficulty, is that there is no "Page" associated with a Canvas when you are using a PdfFormXObject() to render a Pdf fragment.
let template = new PdfFormXObject(rect)
let templateCanvas = PdfCanvas(template, pageContext.Canvas.GetPdfDocument())
let newCanvas = new Canvas(templateCanvas, rect)
Once I have the new Canvas, I try to write to the Canvas and add the Annotation via Page.AddAnnotation(). The problem is that there is no Page attached to the PdfFormXObject!
// Create the destination and annotation (destPage is the pageNumber)
let dest = PdfExplicitDestination.CreateFitB(destPage)
let action = PdfAction.CreateGoTo(dest)
let annotation = PdfLinkAnnotation(rect)
let border = iText.Kernel.Pdf.PdfAnnotationBorder(0f, 0f, 0f)
// set up the Annotation with action and display information
annotation
.SetHighlightMode(PdfAnnotation.HIGHLIGHT_PUSH)
.SetAction(action)
.SetBorder(border)
|> ignore
// Try adding the annotation to the page BOOM! (There is *NO* page (null) associated with newCanvas)
newCanvas.GetPage().AddAnnotation(annotation) |> ignore // HELP HERE: Is there another way to do this?
The issue is that I do not know of a different way to set the Annotation on the canvas. Is there a way to render the annotation and just add the annotation directly to the canvas as raw PDF instructions?
Alternatively, is there a way create a different reusable PDF fragment in iText so I can also reuse the GoTo annotation.
N.B. I could split off the annotations and then apply them every time I use the PdfFormXObject() on a new page, but that sort of defeats the purpose of reusing Pdf fragments (template) in my final PDF to reduce it's size.
If you can point me in the right direction, that would be great.
Again, this is not how to add an annotation to a Page(), that's easy. It's how to add an annotation to a PdfFormXObject (or similar mechanism that I'm unaware of for constructing rusable Pdf fragments).
-- As per John's comments below:
I cannot seem to find any reference to single use annotations.
I'm aware of the following example link, so I modified it to look like this:
private static void Main(string[] args)
{
try
{
PdfDocument pdfDocument = new PdfDocument(new PdfWriter("TestMultiLink.pdf"));
Document document = new Document(pdfDocument);
string destinationName = "MyForwardDestination";
// Create a PdfStringDestination to use more than once.
var stringDestination = new PdfStringDestination(destinationName);
for (int page = 1; page <= 50; page++)
{
document.Add(new Paragraph().SetFontSize(100).Add($"{page}"));
switch (page)
{
case 1: // First use of PdfStringDestination
document.Add(new Paragraph(new Link("Click here for a forward jump", stringDestination).SetFontSize(20)));
break;
case 3: // Re-use the stringDestination
document.Add(new Paragraph(new Link("Click here for a forward jump", stringDestination).SetFontSize(10)));
break;
case 42:
pdfDocument.AddNamedDestination(destinationName, PdfExplicitDestination.CreateFit(pdfDocument.GetLastPage()).GetPdfObject());
break;
}
if (page < 50)
document.Add(new AreaBreak(AreaBreakType.NEXT_PAGE));
}
document.Close();
}
catch (Exception e)
{
Console.WriteLine($"Ouch: {e.Message}");
}
}
If you dig into the iText source for iText.Layout.Link, you'll see that the String Destination is added as an Annotation. Therefore, I'm not sure if John's answer is true anymore.
Does anyone know how I can convert the Annotation to a Dictionary and how I would go about adding the PdfDictionary (raw) info into the PftFormXObject?
Thanks
#johnwhitington is correct.
Per PDF specification, annotations can only be added to a page, they cannot be added to a form XObject. It is not a limitation of iText or any other PDF library.
Annotations cannot be reused, each annotation is a distinct object.

Generate PDF from gsp page

I am using grails 2.5.2.
I have created a table which shows all the data from database to gsp page and now i need to save that shown data in a pdf format with a button click.What will be the best way to show them into a PDF and save it to my directory. please Help
You can use itext for converting HTML into pdf using the code below:
public void createPdf(HttpServletResponse response, String args, String css, String pdfTitle) {
response.setContentType("application/force-download")
response.setHeader("Content-Disposition", "attachment;filename=${pdfTitle}.pdf")
Document document = new Document()
Rectangle one = new Rectangle(900, 600)
document.setPageSize(one)
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream())
document.open()
ByteArrayInputStream bis = new ByteArrayInputStream(args.toString().getBytes())
ByteArrayInputStream cis = new ByteArrayInputStream(css.toString().getBytes())
XMLWorkerHelper.getInstance().parseXHtml(writer, document, bis, cis)
document.close()
}
Though answering this question late,take a look at grails export plugin.It will be useful if you want to export your data to excel and pdf( useful only if there is no in pre-defined template to export).
Got idea from itext. Used itext 2.1.7 and posted all the values to pdf from a controller method. Used images as background and paragraph and phrase to show values from database.

PDFBox getText not returning all of the visible text

I am using PDFBox to extract text from my PDF document. It retrieves the text, but not all of it (specifically, seems like title/header and footer texts are missing). The parts that are missing are not images and are extracted when using text view in foxit reader.
I am using version 1.8.12 and made a test case with 2.0.2 just to see if it would return more of the content.
This is the code i used for 2.0.2:
public static void main(String[] args) {
File file = new File("D:\\\\file.pdf");
try {
PDDocument doc = PDDocument.load(file);
PDFTextStripper stripper = new PDFTextStripper();
//stripper.setSuppressDuplicateOverlappingText(false);
stripper.getText(doc);
} catch (Exception e) {
System.out.println("Exc errirs ");
}
}
Now I wonder are there any settings I missed? Is PDFBox failing because text is on top of some decorative elements (rectangle under text)?
Thanks
EDIT: link to file in question
As discussed in the comments, the text wasn't missing, but at the "wrong" position. By default, PDFBox text extraction extracts the characters as they come in the content stream, but they don't always come in a "natural" way. PDF files are created by software, not by humans.
An alternative is to use the sort option:
stripper.setSortByPosition(true)
However, as mkl pointed out, if the text is in two columns, you won't like the result either.

How to add watermark to a landscape file using pdfbox

I'm using pdfbox 1.8.11 and FOP to add water mark to pdf:s. It works nicely to most input pdf files.
However I get a problem when the file is in landscape, the watermarking will be 90 degree right rotated.
I had similar problem with visible signature, it is fixed. thanks to the solution in sign landscape file . Any idea how to make water mark rotation works? Thanks in advance!
The original picture for watermark is:
Up arrow
After FOP watermark the image is rotated:
image rotated
apologize for answer late.
The idea for 'water mark' here to add add some transforms into the original pdf using fop apache fop. You can fine java code example and fo template example from apache fop website.
In any case i will illustrate the example here too:
1. the java code of how to use fop
import org.apache.fop.apps.*;
import org.xml.sax.*;
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.sax.*;
import javax.xml.transform.stream.*;
class rendtest {
private static FopFactory fopFactory = FopFactory.newInstance(new File(".").toURI());
private static TransformerFactory tFactory = TransformerFactory.newInstance();
public static void main(String args[]) {
OutputStream out;
try {
//Load the stylesheet
Templates templates = tFactory.newTemplates(
new StreamSource(new File(args[1])));
//First run (to /dev/null)
out = new org.apache.commons.io.output.NullOutputStream();
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
Transformer transformer = templates.newTransformer();
transformer.setParameter("page-count", "#");
transformer.transform(new StreamSource(new File(args[0])),
new SAXResult(fop.getDefaultHandler()));
//Get total page count
String pageCount = Integer.toString(driver.getResults().getPageCount());
//Second run (the real thing)
out = new java.io.FileOutputStream(args[2]);
out = new java.io.BufferedOutputStream(out);
try {
foUserAgent = fopFactory.newFOUserAgent();
fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
transformer = templates.newTransformer();
transformer.setParameter("page-count", pageCount);
transformer.transform(new StreamSource(new File(args[0])),
new SAXResult(fop.getDefaultHandler()));
} finally {
out.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
for the problem i had for rendering landscape pdf:s, in fop template you only need to add one more attribute to tell this file is in landscape layout.
The attribute is to set reference-orientation="90". Then your other definitions in the fop template will be applied properly.

Crop PDF & add margins

I have a PDF with a CropBox size of 6" wide x 9" high. I need to add it to a standard letter-sized PDF. If I change the CropBox size, then the cropmarks become visible. So ideally what I'd like to do is crop out just the visible portion of the page, then pad the sides so that the total height and width is letter-sized.
Is this possible using PDFBox or another Java class?
Have you found an answer to your problem ? I have been facing the same scenario this week.
I have a standard letter-size (8,5" x 11") PDF A, containing a header, a footer, and a form. I have no control over that PDF's generation, so the header and footer are a bit dirty and I need to remove them. My first approach was to extract the form into a Box (any type of box works), and then export it as a new PDF page. Problem is, my new Box is a certain size (let's say 6" x 7"), and after thorough research into the docs, I was unable to find a way to embed it into a 8,5" x 11" PDF B ; the output PDF was the same size as my Box. All scenarios either led to a blank PDF file of the right size, or a PDF containing my form but of wrong dimensions.
I then had no choice but to use another approach. It isn't very clean, but hey, when working with PDFs, black magic and workarounds are the main topic. I simply kept the original PDF A, and blanked out all the unwanted parts. That means, I created rectangles, filled them with white, and covered up the sections I wanted to hide. Result is a PDF file, of right dimension, containing only my form. Hooray ! Technically, the header and footer are still present in the page, there was no way to actually remove them ; I was only able to hide them (this doesn't make any difference to the end user as long as you're not hiding sensitive data).
I realize your question was submitted 2 years ago, but I had a very hard time finding a proper answer to my question online, so here's me giving back to the community, and hoping I can help future developers save some time. If you actually found a way to extract a box and embed it in a standard-size page, please post your answer !
Here is my code by the way :
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import java.awt.Color;
import java.io.*;
import java.util.List;
// This code doesn't actually extract PDF elements per say
// It fills 2 rectangles in white to hide the header and the footer of our PDF page
public class ex {
// Arbitrary values obtained in a very obscure way
static int PAGE_WIDTH = 615;
static int PAGE_HEIGHT = 815;
#SuppressWarnings("unchecked")
public static void main(String[] args) throws IOException, COSVisitorException {
File inputFile = new File("C:\\input.pdf");
File outputFile = new File("C:\\output.pdf");
PDDocument inputDoc = PDDocument.load(inputFile);
PDDocument outputDoc = new PDDocument();
List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();
PDPageContentStream pageCS = null;
// Lets paint our pages white !
for (PDPage page : pages) {
pageCS = new PDPageContentStream(inputDoc, page, true, false);
pageCS.setNonStrokingColor(Color.white);
// Top rectangle
pageCS.fillRect(0, 0, PAGE_WIDTH, 30);
// Bottom rectangle
pageCS.fillRect(0, PAGE_HEIGHT-30, PAGE_WIDTH, 30);
pageCS.close();
outputDoc.addPage(page);
}
// Save to file
outputFile.delete();
outputDoc.save(outputFile);
// Wait until the end to close all documents, or else you get an error
inputDoc.close();
outputDoc.close();
}
}
I have adopted the answer of John a little bit, maybe this will help someone.
I have changed the loop to create a new rectangle, with the wanted dimensions. Then the rectangle is set to the page and afterwards added to the new document. I used this snippet to crop a black border out of a long scanned document.
Notice that this will change the size of the pages.
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import java.io.File;
import java.io.IOException;
import java.util.List;
public class Main {
#SuppressWarnings("unchecked")
public static void main(String[] args) throws IOException, COSVisitorException {
File inputFile = new File("/path/to/your/file");
File outputFile = new File("/path/to/your/file");
PDDocument inputDoc = PDDocument.load(inputFile);
PDDocument outputDoc = new PDDocument();
List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();
// Lets paint our pages white !
for (PDPage page : pages) {
PDRectangle rectangle=new PDRectangle();
rectangle.setLowerLeftX(0);
rectangle.setLowerLeftY(0);
rectangle.setUpperRightX(500);
rectangle.setUpperRightY(680);
page.setMediaBox(rectangle);
page.setCropBox(rectangle);
outputDoc.addPage(page);
}
// Save to file
// outputFile.delete();
outputDoc.save(outputFile);
// Wait until the end to close all documents, or else you get an error
inputDoc.close();
outputDoc.close();
}
}
Other than adding a rectangle to the PDPage constructor you can do this do set the CropBox to any size:
PDRectangle box = new PDRectangle(pageWidth, pageHeight);
page.setMediaBox(box); // MediaBox > BleedBox > TrimBox/CropBox