I am using PDFBox for the first time to generate a PDF. I have a text document which consists of a series of about 40 multi-choice questions generated by my java program. Some of the questions have associated small images which need to be inserted above the question.
For this reason I am converting the text document to a PDF and hope to insert the images on that.
I have managed to insert an image into the PDF document but it underlay’s the text like a background.
I want to place the images in line with the text (as in word format text box, inline).
It seems the insert image classes need an absolute position which will depend on the position of the text.
How can I know where to draw my image?
for info PDFBox 2.0.7.jar
import ExamDatabase.ReadInputFile;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.apache.pdfbox.pdmodel.font.PDFontFactory;//???look up
import org.apache.pdfbox.pdmodel.font.PDTrueTypeFont;
import org.apache.pdfbox.pdmodel.font.PDType3Font;
import org.apache.pdfbox.pdmodel.font.PDSimpleFont;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDInlineImage;
/**
*
* #author Steve carr
*/
public class HelloWorldPdf1_1_1
{
//runs
/**
* #param args the command line arguments
* #throws java.io.IOException
*/
public static void main(String[] args) throws IOException
{
ReadInputFile fileI = new ReadInputFile();// read plain text file text file
ArrayList<String> localList = fileI.readerNew();
// Create a document and add a page to it
try (PDDocument document = new PDDocument())
{
PDPage page = new PDPage();
document.addPage(page);
// Create a new font1 object selecting one of the PDF base fonts
PDFont font1 = PDType1Font.HELVETICA;//TIMES_ROMAN;
PDFont font2 = PDType1Font.TIMES_ROMAN;
PDFont font3 = PDType1Font.COURIER_BOLD;
try (PDPageContentStream contentStream = new PDPageContentStream(document, page))
{
//Creating PDImageXObject object
PDImageXObject pdImage = PDImageXObject.createFromFile("C:/PdfBox_Examples/CARD00.GIF", document);
//**creating the PDPageContentStream object
//PDPageContentStream contents = new PDPageContentStream(document, page);
//**Drawing the image in the PDF document
contentStream.drawImage(pdImage, 100, 500, 50, 70);//1ST number is horizontal posn from left
//****TEXTTEXTTEXTTEXT
// Define a text content stream using the selected font1, moving the cursor and drawing the text "Hello World"
contentStream.beginText();
contentStream.setFont(font1, 11);
contentStream.newLineAtOffset(0, 0);
contentStream.setCharacterSpacing(0);
contentStream.setWordSpacing(0);
contentStream.setLeading(0);
contentStream.setLeading(14.5f);// this was key for some reason
contentStream.moveTextPositionByAmount(100, 700);// sets the start point of text
System.out.println("localList.size= " + localList.size());//just checking within bounds during testing
String line;
int i;
for (i = 0; i < 138; ++i)
{
System.out.println(localList.get(i));
line = localList.get(i);
contentStream.drawString(line);
contentStream.newLine();
}
contentStream.endText();
//******************************************************
// Make sure that the content stream is closed:
contentStream.close();
}
// Save the results and ensure that the document is properly closed:
document.save("Hello World.pdf");
}
}
}
result output with text written on top of image:
As per this pdf box fix: https://issues.apache.org/jira/browse/PDFBOX-738, transparency is preserved only when rgba is set.so if transparency is preserved it will look as inline with the other text rather than an overlay, so this could be a solution for your first part of the problem ie the overlay issue.
And this example helps you find how to compute the width occupied by a specific text and thus to calculate where to place the image next after the text:
https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/interactive/form/DetermineTextFitsField.java?revision=1749360&view=markup
Related
I created a pdf with iText7. The pdf has a header on each page which consists of two (sometimes more) rows. I added them as in the jump start tutorial, chapter 3.
The problem is, that there are no tags generated, so the screenreader (JAWS) does'nt find the header and blind users can not access it.
I tried to add some tags manually to mimic a table, but that seems to be ignored completly.
Here is my code to create the pdf:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import com.itextpdf.io.font.PdfEncodings;
import com.itextpdf.kernel.events.Event;
import com.itextpdf.kernel.events.PdfDocumentEvent;
import com.itextpdf.kernel.font.PdfFont;
import com.itextpdf.kernel.font.PdfFontFactory;
import com.itextpdf.kernel.geom.PageSize;
import com.itextpdf.kernel.pdf.*;
import com.itextpdf.kernel.pdf.canvas.PdfCanvas;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.Paragraph;
import com.itextpdf.pdfa.PdfADocument;
public class ITextHeader {
private PdfADocument pdf;
private PdfFont bf;
public static void main(String[] args) throws Exception {
new ITextHeader().createPdf();
}
private void createPdf() throws Exception {
PdfWriter writer = new PdfWriter(new FileOutputStream("header.pdf"));
InputStream icm = new FileInputStream("sRGB_CS_profile.icm");
pdf = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_1A,
new PdfOutputIntent("Custom", "", null, "sRGB IEC61966-2.1", icm));
pdf.setTagged();
bf = PdfFontFactory.createFont("arial.ttf", PdfEncodings.IDENTITY_H);
try (Document pdfDocument = new Document(pdf, PageSize.A4, true)) {
pdfDocument.setMargins(100, 15, 50, 15);
pdf.addEventHandler(PdfDocumentEvent.START_PAGE, this::createHeader);
pdfDocument.add(new Paragraph("Here is the content").setFont(bf).setFontSize(10));
}
}
public void createHeader(Event event) {
PdfDocumentEvent docEvent = (PdfDocumentEvent) event;
PdfPage page = docEvent.getPage();
PdfCanvas pdfCanvas = new PdfCanvas(
page.newContentStreamBefore(), page.getResources(), pdf);
pdfCanvas.beginText()
.setFontAndSize(bf, 10)
.beginMarkedContent(PdfName.Table)
.moveText(15, 804)
.beginMarkedContent(PdfName.TR)
.beginMarkedContent(PdfName.TD)
.showText("My Title")
.endMarkedContent() // TD
.moveText(466, 0)
.beginMarkedContent(PdfName.TD)
.showText("Date: 01.01.2022")
.endMarkedContent() // TD
.endMarkedContent() // TR
.moveText(-466, -14)
.beginMarkedContent(PdfName.TR)
.beginMarkedContent(PdfName.TD)
.showText("My Subtitle")
.endMarkedContent() // TD
.moveText(466, 0)
.beginMarkedContent(PdfName.TD)
.showText("Time: 12:30")
.endMarkedContent() // TD
.endMarkedContent() // TR
.endMarkedContent() // TABLE
.endText();
}
}
This is the structure of the pdf as shown by PDF Accessibility Checker:
The Accesibility Checker also complains about not tagged content:
We solved the issue with the following workaround: the header on the first page is rendered as a PDF table, on the following pages we use the canvas to display the text.
This solution is somewhat ankward because we have to implement the headers twice with different techniques, but now JAWS finds at least the header on the first page.
I'm using pdfbox 1.8.11 and FOP to add water mark to pdf:s. It works nicely to most input pdf files.
However I get a problem when the file is in landscape, the watermarking will be 90 degree right rotated.
I had similar problem with visible signature, it is fixed. thanks to the solution in sign landscape file . Any idea how to make water mark rotation works? Thanks in advance!
The original picture for watermark is:
Up arrow
After FOP watermark the image is rotated:
image rotated
apologize for answer late.
The idea for 'water mark' here to add add some transforms into the original pdf using fop apache fop. You can fine java code example and fo template example from apache fop website.
In any case i will illustrate the example here too:
1. the java code of how to use fop
import org.apache.fop.apps.*;
import org.xml.sax.*;
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.sax.*;
import javax.xml.transform.stream.*;
class rendtest {
private static FopFactory fopFactory = FopFactory.newInstance(new File(".").toURI());
private static TransformerFactory tFactory = TransformerFactory.newInstance();
public static void main(String args[]) {
OutputStream out;
try {
//Load the stylesheet
Templates templates = tFactory.newTemplates(
new StreamSource(new File(args[1])));
//First run (to /dev/null)
out = new org.apache.commons.io.output.NullOutputStream();
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
Transformer transformer = templates.newTransformer();
transformer.setParameter("page-count", "#");
transformer.transform(new StreamSource(new File(args[0])),
new SAXResult(fop.getDefaultHandler()));
//Get total page count
String pageCount = Integer.toString(driver.getResults().getPageCount());
//Second run (the real thing)
out = new java.io.FileOutputStream(args[2]);
out = new java.io.BufferedOutputStream(out);
try {
foUserAgent = fopFactory.newFOUserAgent();
fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
transformer = templates.newTransformer();
transformer.setParameter("page-count", pageCount);
transformer.transform(new StreamSource(new File(args[0])),
new SAXResult(fop.getDefaultHandler()));
} finally {
out.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
for the problem i had for rendering landscape pdf:s, in fop template you only need to add one more attribute to tell this file is in landscape layout.
The attribute is to set reference-orientation="90". Then your other definitions in the fop template will be applied properly.
I have a huge pdf containing >1000 pages, need to edit existing text, offcourse it is added by me using pdfbox addtext to each page example ... the text font size was very big text runs out of page..
now i want to decrease the size of font so that it will be within page limits... or i can clear the existing text and replace a the same text with new font...
credits to Tilman Hausherr for answer
If you used the code you linked to, then you will find the "added message" in the content stream array, as the second last item. PDPage.getCosObject().getItem(COSName.Contents) and save the file.
public void removeStamp(File src) throws IOException {
PDDocument doc = PDDocument.load(src);
PDPageTree pages = doc.getPages();
for (PDPage page : pages) {
COSArray array = ((COSArray) page.getCOSObject().getItem(COSName.CONTENTS));
array.remove(array.size() - 1);
}
doc.save(src);
}
I have a PDF with a CropBox size of 6" wide x 9" high. I need to add it to a standard letter-sized PDF. If I change the CropBox size, then the cropmarks become visible. So ideally what I'd like to do is crop out just the visible portion of the page, then pad the sides so that the total height and width is letter-sized.
Is this possible using PDFBox or another Java class?
Have you found an answer to your problem ? I have been facing the same scenario this week.
I have a standard letter-size (8,5" x 11") PDF A, containing a header, a footer, and a form. I have no control over that PDF's generation, so the header and footer are a bit dirty and I need to remove them. My first approach was to extract the form into a Box (any type of box works), and then export it as a new PDF page. Problem is, my new Box is a certain size (let's say 6" x 7"), and after thorough research into the docs, I was unable to find a way to embed it into a 8,5" x 11" PDF B ; the output PDF was the same size as my Box. All scenarios either led to a blank PDF file of the right size, or a PDF containing my form but of wrong dimensions.
I then had no choice but to use another approach. It isn't very clean, but hey, when working with PDFs, black magic and workarounds are the main topic. I simply kept the original PDF A, and blanked out all the unwanted parts. That means, I created rectangles, filled them with white, and covered up the sections I wanted to hide. Result is a PDF file, of right dimension, containing only my form. Hooray ! Technically, the header and footer are still present in the page, there was no way to actually remove them ; I was only able to hide them (this doesn't make any difference to the end user as long as you're not hiding sensitive data).
I realize your question was submitted 2 years ago, but I had a very hard time finding a proper answer to my question online, so here's me giving back to the community, and hoping I can help future developers save some time. If you actually found a way to extract a box and embed it in a standard-size page, please post your answer !
Here is my code by the way :
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import java.awt.Color;
import java.io.*;
import java.util.List;
// This code doesn't actually extract PDF elements per say
// It fills 2 rectangles in white to hide the header and the footer of our PDF page
public class ex {
// Arbitrary values obtained in a very obscure way
static int PAGE_WIDTH = 615;
static int PAGE_HEIGHT = 815;
#SuppressWarnings("unchecked")
public static void main(String[] args) throws IOException, COSVisitorException {
File inputFile = new File("C:\\input.pdf");
File outputFile = new File("C:\\output.pdf");
PDDocument inputDoc = PDDocument.load(inputFile);
PDDocument outputDoc = new PDDocument();
List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();
PDPageContentStream pageCS = null;
// Lets paint our pages white !
for (PDPage page : pages) {
pageCS = new PDPageContentStream(inputDoc, page, true, false);
pageCS.setNonStrokingColor(Color.white);
// Top rectangle
pageCS.fillRect(0, 0, PAGE_WIDTH, 30);
// Bottom rectangle
pageCS.fillRect(0, PAGE_HEIGHT-30, PAGE_WIDTH, 30);
pageCS.close();
outputDoc.addPage(page);
}
// Save to file
outputFile.delete();
outputDoc.save(outputFile);
// Wait until the end to close all documents, or else you get an error
inputDoc.close();
outputDoc.close();
}
}
I have adopted the answer of John a little bit, maybe this will help someone.
I have changed the loop to create a new rectangle, with the wanted dimensions. Then the rectangle is set to the page and afterwards added to the new document. I used this snippet to crop a black border out of a long scanned document.
Notice that this will change the size of the pages.
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import java.io.File;
import java.io.IOException;
import java.util.List;
public class Main {
#SuppressWarnings("unchecked")
public static void main(String[] args) throws IOException, COSVisitorException {
File inputFile = new File("/path/to/your/file");
File outputFile = new File("/path/to/your/file");
PDDocument inputDoc = PDDocument.load(inputFile);
PDDocument outputDoc = new PDDocument();
List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();
// Lets paint our pages white !
for (PDPage page : pages) {
PDRectangle rectangle=new PDRectangle();
rectangle.setLowerLeftX(0);
rectangle.setLowerLeftY(0);
rectangle.setUpperRightX(500);
rectangle.setUpperRightY(680);
page.setMediaBox(rectangle);
page.setCropBox(rectangle);
outputDoc.addPage(page);
}
// Save to file
// outputFile.delete();
outputDoc.save(outputFile);
// Wait until the end to close all documents, or else you get an error
inputDoc.close();
outputDoc.close();
}
}
Other than adding a rectangle to the PDPage constructor you can do this do set the CropBox to any size:
PDRectangle box = new PDRectangle(pageWidth, pageHeight);
page.setMediaBox(box); // MediaBox > BleedBox > TrimBox/CropBox
I am able to insert an Image inside an existing pdf document, but the problem is,
The image is placed at the bottom of the page
The page becomes white with the newly added text showing on it.
I am using following code.
List<PDPage> pages = pdDoc.getDocumentCatalog().getAllPages();
if(pages.size() > 0){
PDJpeg img = new PDJpeg(pdDoc, in);
PDPageContentStream stream = new PDPageContentStream(pdDoc,pages.get(0));
stream.drawImage(img, 60, 60);
stream.close();
}
I want the image on the first page.
PDFBox is a low-level library to work with PDF files. You are responsible for more high-level features. So in this example, you are placing your image at (60, 60) starting from lower-left corner of your document. That is what stream.drawImage(img, 60, 60); does.
If you want to move your image somewhere else, you have to calculate and provide the wanted location (perhaps from dimensions obtained with page.findCropBox(), or manually input your location).
As for the text, PDF document elements are absolutely positioned. There are no low-level capabilities for re-flowing text, floating or something similar. If you write your text on top of your image, it will be written on top of your image.
Finally, for your page becoming white -- you are creating a new content stream and so overwriting the original one for your page. You should be appending to the already available stream.
The relevant line is:
PDPageContentStream stream = new PDPageContentStream( pdDoc, pages.get(0));
What you should do is call it like this:
PDPageContentStream stream = new PDPageContentStream( pdDoc, pages.get(0), true, true);
The first true is whether to append content, and the final true (not critical here) is whether to compress the stream.
Take a look at AddImageToPDF sample available from PDFBox sources.
Try this
doc = PDDocument.load( inputFileName );
PDXObjectImage ximage = null;
ximage = new PDJpeg(doc, new FileInputStream( image )
PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get(0);
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
contentStream.drawImage( ximage, 425, 675 );
contentStream.close();
This prints the image in first page. If u want to print in all pages just put on a for loop with a condition of number of pages as the limit.
This worked for me well!
So late answer but this is for who works on it in 2020 with Kotlin: drawImage() is getting float values inside itself so try this:
val file = File(getPdfFile(FILE_NAME))
val document = PDDocument.load(file)
val page = document.getPage(0)
val contentStream: PDPageContentStream
contentStream = PDPageContentStream(document, page, true, true)
// Define a content stream for adding to the PDF
val bitmap: Bitmap? = ImageSaver(this).setFileName("sign.png").setDirectoryName("signature").load()
val mediaBox: PDRectangle = page.mediaBox
val ximage: PDImageXObject = JPEGFactory.createFromImage(document, bitmap)
contentStream.drawImage(ximage, mediaBox.width - 4 * 65, 26f)
// Make sure that the content stream is closed:
contentStream.close()
// Save the final pdf document to a file
pdfSaveLocation = "$directoryPDF/$UPDATED_FILE_NAME"
val pathSave = pdfSaveLocation
document.save(pathSave)
document.close()
I am creating a new PDF and running below code in a loop - to add one image per page and below co-ordinates and height and width values work well for me.
where out is BufferedImage reference variable
PDPage page = new PDPage();
outputdocument.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(outputdocument, page, AppendMode.APPEND, true);
PDImageXObject pdImageXObject = JPEGFactory.createFromImage(outputdocument, out);
contentStream.drawImage(pdImageXObject, 5, 2, 600, 750);
contentStream.close();
This link gives you details about Class PrintImageLocations.
This PrintImageLocations will give you the x and y coordinates of the images.
Usage: java org.apache.pdfbox.examples.util.PrintImageLocations input-pdf