I have a PDF with a CropBox size of 6" wide x 9" high. I need to add it to a standard letter-sized PDF. If I change the CropBox size, then the cropmarks become visible. So ideally what I'd like to do is crop out just the visible portion of the page, then pad the sides so that the total height and width is letter-sized.
Is this possible using PDFBox or another Java class?
Have you found an answer to your problem ? I have been facing the same scenario this week.
I have a standard letter-size (8,5" x 11") PDF A, containing a header, a footer, and a form. I have no control over that PDF's generation, so the header and footer are a bit dirty and I need to remove them. My first approach was to extract the form into a Box (any type of box works), and then export it as a new PDF page. Problem is, my new Box is a certain size (let's say 6" x 7"), and after thorough research into the docs, I was unable to find a way to embed it into a 8,5" x 11" PDF B ; the output PDF was the same size as my Box. All scenarios either led to a blank PDF file of the right size, or a PDF containing my form but of wrong dimensions.
I then had no choice but to use another approach. It isn't very clean, but hey, when working with PDFs, black magic and workarounds are the main topic. I simply kept the original PDF A, and blanked out all the unwanted parts. That means, I created rectangles, filled them with white, and covered up the sections I wanted to hide. Result is a PDF file, of right dimension, containing only my form. Hooray ! Technically, the header and footer are still present in the page, there was no way to actually remove them ; I was only able to hide them (this doesn't make any difference to the end user as long as you're not hiding sensitive data).
I realize your question was submitted 2 years ago, but I had a very hard time finding a proper answer to my question online, so here's me giving back to the community, and hoping I can help future developers save some time. If you actually found a way to extract a box and embed it in a standard-size page, please post your answer !
Here is my code by the way :
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import java.awt.Color;
import java.io.*;
import java.util.List;
// This code doesn't actually extract PDF elements per say
// It fills 2 rectangles in white to hide the header and the footer of our PDF page
public class ex {
// Arbitrary values obtained in a very obscure way
static int PAGE_WIDTH = 615;
static int PAGE_HEIGHT = 815;
#SuppressWarnings("unchecked")
public static void main(String[] args) throws IOException, COSVisitorException {
File inputFile = new File("C:\\input.pdf");
File outputFile = new File("C:\\output.pdf");
PDDocument inputDoc = PDDocument.load(inputFile);
PDDocument outputDoc = new PDDocument();
List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();
PDPageContentStream pageCS = null;
// Lets paint our pages white !
for (PDPage page : pages) {
pageCS = new PDPageContentStream(inputDoc, page, true, false);
pageCS.setNonStrokingColor(Color.white);
// Top rectangle
pageCS.fillRect(0, 0, PAGE_WIDTH, 30);
// Bottom rectangle
pageCS.fillRect(0, PAGE_HEIGHT-30, PAGE_WIDTH, 30);
pageCS.close();
outputDoc.addPage(page);
}
// Save to file
outputFile.delete();
outputDoc.save(outputFile);
// Wait until the end to close all documents, or else you get an error
inputDoc.close();
outputDoc.close();
}
}
I have adopted the answer of John a little bit, maybe this will help someone.
I have changed the loop to create a new rectangle, with the wanted dimensions. Then the rectangle is set to the page and afterwards added to the new document. I used this snippet to crop a black border out of a long scanned document.
Notice that this will change the size of the pages.
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import java.io.File;
import java.io.IOException;
import java.util.List;
public class Main {
#SuppressWarnings("unchecked")
public static void main(String[] args) throws IOException, COSVisitorException {
File inputFile = new File("/path/to/your/file");
File outputFile = new File("/path/to/your/file");
PDDocument inputDoc = PDDocument.load(inputFile);
PDDocument outputDoc = new PDDocument();
List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();
// Lets paint our pages white !
for (PDPage page : pages) {
PDRectangle rectangle=new PDRectangle();
rectangle.setLowerLeftX(0);
rectangle.setLowerLeftY(0);
rectangle.setUpperRightX(500);
rectangle.setUpperRightY(680);
page.setMediaBox(rectangle);
page.setCropBox(rectangle);
outputDoc.addPage(page);
}
// Save to file
// outputFile.delete();
outputDoc.save(outputFile);
// Wait until the end to close all documents, or else you get an error
inputDoc.close();
outputDoc.close();
}
}
Other than adding a rectangle to the PDPage constructor you can do this do set the CropBox to any size:
PDRectangle box = new PDRectangle(pageWidth, pageHeight);
page.setMediaBox(box); // MediaBox > BleedBox > TrimBox/CropBox
Related
I am using PDFBox for the first time to generate a PDF. I have a text document which consists of a series of about 40 multi-choice questions generated by my java program. Some of the questions have associated small images which need to be inserted above the question.
For this reason I am converting the text document to a PDF and hope to insert the images on that.
I have managed to insert an image into the PDF document but it underlay’s the text like a background.
I want to place the images in line with the text (as in word format text box, inline).
It seems the insert image classes need an absolute position which will depend on the position of the text.
How can I know where to draw my image?
for info PDFBox 2.0.7.jar
import ExamDatabase.ReadInputFile;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.apache.pdfbox.pdmodel.font.PDFontFactory;//???look up
import org.apache.pdfbox.pdmodel.font.PDTrueTypeFont;
import org.apache.pdfbox.pdmodel.font.PDType3Font;
import org.apache.pdfbox.pdmodel.font.PDSimpleFont;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDInlineImage;
/**
*
* #author Steve carr
*/
public class HelloWorldPdf1_1_1
{
//runs
/**
* #param args the command line arguments
* #throws java.io.IOException
*/
public static void main(String[] args) throws IOException
{
ReadInputFile fileI = new ReadInputFile();// read plain text file text file
ArrayList<String> localList = fileI.readerNew();
// Create a document and add a page to it
try (PDDocument document = new PDDocument())
{
PDPage page = new PDPage();
document.addPage(page);
// Create a new font1 object selecting one of the PDF base fonts
PDFont font1 = PDType1Font.HELVETICA;//TIMES_ROMAN;
PDFont font2 = PDType1Font.TIMES_ROMAN;
PDFont font3 = PDType1Font.COURIER_BOLD;
try (PDPageContentStream contentStream = new PDPageContentStream(document, page))
{
//Creating PDImageXObject object
PDImageXObject pdImage = PDImageXObject.createFromFile("C:/PdfBox_Examples/CARD00.GIF", document);
//**creating the PDPageContentStream object
//PDPageContentStream contents = new PDPageContentStream(document, page);
//**Drawing the image in the PDF document
contentStream.drawImage(pdImage, 100, 500, 50, 70);//1ST number is horizontal posn from left
//****TEXTTEXTTEXTTEXT
// Define a text content stream using the selected font1, moving the cursor and drawing the text "Hello World"
contentStream.beginText();
contentStream.setFont(font1, 11);
contentStream.newLineAtOffset(0, 0);
contentStream.setCharacterSpacing(0);
contentStream.setWordSpacing(0);
contentStream.setLeading(0);
contentStream.setLeading(14.5f);// this was key for some reason
contentStream.moveTextPositionByAmount(100, 700);// sets the start point of text
System.out.println("localList.size= " + localList.size());//just checking within bounds during testing
String line;
int i;
for (i = 0; i < 138; ++i)
{
System.out.println(localList.get(i));
line = localList.get(i);
contentStream.drawString(line);
contentStream.newLine();
}
contentStream.endText();
//******************************************************
// Make sure that the content stream is closed:
contentStream.close();
}
// Save the results and ensure that the document is properly closed:
document.save("Hello World.pdf");
}
}
}
result output with text written on top of image:
As per this pdf box fix: https://issues.apache.org/jira/browse/PDFBOX-738, transparency is preserved only when rgba is set.so if transparency is preserved it will look as inline with the other text rather than an overlay, so this could be a solution for your first part of the problem ie the overlay issue.
And this example helps you find how to compute the width occupied by a specific text and thus to calculate where to place the image next after the text:
https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/interactive/form/DetermineTextFitsField.java?revision=1749360&view=markup
I'm using pdfbox 1.8.11 and FOP to add water mark to pdf:s. It works nicely to most input pdf files.
However I get a problem when the file is in landscape, the watermarking will be 90 degree right rotated.
I had similar problem with visible signature, it is fixed. thanks to the solution in sign landscape file . Any idea how to make water mark rotation works? Thanks in advance!
The original picture for watermark is:
Up arrow
After FOP watermark the image is rotated:
image rotated
apologize for answer late.
The idea for 'water mark' here to add add some transforms into the original pdf using fop apache fop. You can fine java code example and fo template example from apache fop website.
In any case i will illustrate the example here too:
1. the java code of how to use fop
import org.apache.fop.apps.*;
import org.xml.sax.*;
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.sax.*;
import javax.xml.transform.stream.*;
class rendtest {
private static FopFactory fopFactory = FopFactory.newInstance(new File(".").toURI());
private static TransformerFactory tFactory = TransformerFactory.newInstance();
public static void main(String args[]) {
OutputStream out;
try {
//Load the stylesheet
Templates templates = tFactory.newTemplates(
new StreamSource(new File(args[1])));
//First run (to /dev/null)
out = new org.apache.commons.io.output.NullOutputStream();
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
Transformer transformer = templates.newTransformer();
transformer.setParameter("page-count", "#");
transformer.transform(new StreamSource(new File(args[0])),
new SAXResult(fop.getDefaultHandler()));
//Get total page count
String pageCount = Integer.toString(driver.getResults().getPageCount());
//Second run (the real thing)
out = new java.io.FileOutputStream(args[2]);
out = new java.io.BufferedOutputStream(out);
try {
foUserAgent = fopFactory.newFOUserAgent();
fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
transformer = templates.newTransformer();
transformer.setParameter("page-count", pageCount);
transformer.transform(new StreamSource(new File(args[0])),
new SAXResult(fop.getDefaultHandler()));
} finally {
out.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
for the problem i had for rendering landscape pdf:s, in fop template you only need to add one more attribute to tell this file is in landscape layout.
The attribute is to set reference-orientation="90". Then your other definitions in the fop template will be applied properly.
I have a huge pdf containing >1000 pages, need to edit existing text, offcourse it is added by me using pdfbox addtext to each page example ... the text font size was very big text runs out of page..
now i want to decrease the size of font so that it will be within page limits... or i can clear the existing text and replace a the same text with new font...
credits to Tilman Hausherr for answer
If you used the code you linked to, then you will find the "added message" in the content stream array, as the second last item. PDPage.getCosObject().getItem(COSName.Contents) and save the file.
public void removeStamp(File src) throws IOException {
PDDocument doc = PDDocument.load(src);
PDPageTree pages = doc.getPages();
for (PDPage page : pages) {
COSArray array = ((COSArray) page.getCOSObject().getItem(COSName.CONTENTS));
array.remove(array.size() - 1);
}
doc.save(src);
}
Using iTextSharp I am creating a PDF composed of a collection of existing PDFs, some of the included PDFs are landscape orientation and need to be rotated. So, I do the following:
private static void AdjustRotationIfNeeded(PdfImportedPage pdfImportedPage, PdfReader reader, int documentPage)
{
float width = pdfImportedPage.Width;
float height = pdfImportedPage.Height;
if (pdfImportedPage.Rotation != 0)
{
PdfDictionary pageDict = reader.GetPageN(documentPage);
pageDict.Put(PdfName.ROTATE, new PdfNumber(0));
}
if (width > height)
{
PdfDictionary pageDict = reader.GetPageN(documentPage);
pageDict.Put(PdfName.ROTATE, new PdfNumber(270));
}
}
This works great. The included PDFs rotated to portrait orientation if needed. The PDF prints correctly on my local printer.
This file is sent to a fulfillment house, and unfortunately, the landscape included files do not print properly when going through their printer and rasterization process. They use Kodak (Creo) NexRip 11.01 or Kodak (Creo) Prinergy 6.1. machines. The fulfillment house's suggestion is to: "generate a new PDF file after we rotate pages or make any changes to a PDF. It is as easy as exporting out to a PostScript and distilling back to a PDF."
I know iTextSharp doesn't support PostScript. Is there another way iTextSharp can rotate included PDFs to hold the orientation when rasterized?
First let me assure you that changing the rotation in the page dictionary is the correct procedure to achieve what you want. As far as I can see your code, there's nothing wrong with it. You are doing the right thing.
Unfortunately, you are faced with a third party product over which you have no control that is not doing the right thing. How to solve this?
I have written an example called IncorrectExample. I have named it that way because I don't want it to be used in a context that is different from yours. You can safely ignore all the warnings I added: they are not meant for you. This example is very specific to your problem.
Please try the following code:
public void manipulatePdf(String src, String dest)
throws IOException, DocumentException {
// Creating a reader
PdfReader reader = new PdfReader(src);
// step 1
Rectangle pagesize = getPageSize(reader, 1);
Document document = new Document(pagesize);
// step 2
PdfWriter writer
= PdfWriter.getInstance(document, new FileOutputStream(dest));
// step 3
document.open();
// step 4
PdfContentByte cb = writer.getDirectContent();
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
pagesize = getPageSize(reader, i);
document.setPageSize(pagesize);
document.newPage();
PdfImportedPage page = writer.getImportedPage(reader, i);
if (isPortrait(reader, i)) {
cb.addTemplate(page, 0, 0);
}
else {
cb.addTemplate(page, 0, 1, -1, 0, pagesize.getWidth(), 0);
}
}
// step 4
document.close();
reader.close();
}
public Rectangle getPageSize(PdfReader reader, int pagenumber) {
Rectangle pagesize = reader.getPageSizeWithRotation(pagenumber);
return new Rectangle(
Math.min(pagesize.getWidth(), pagesize.getHeight()),
Math.max(pagesize.getWidth(), pagesize.getHeight()));
}
public boolean isPortrait(PdfReader reader, int pagenumber) {
Rectangle pagesize = reader.getPageSize(pagenumber);
return pagesize.getHeight() > pagesize.getWidth();
}
I have taken the pages.pdf file as an example. This file is special in the sense that it has two pages in landscape that are created in a different way:
one page is a page of which the width is smaller than the height (sounds like it's a page in portrait), but as there's a /Rotate value of 90 added to the page dictionary, it is shown in landscape.
the other page isn't rotated, but it has a height that is smaller than the width.
In my example, I am using the classes Document and PdfWriter to create a copy of the original document. This is wrong in general because it throws away all interaction. I should use PdfStamper or PdfCopy instead, but it is right in your specific case because you don't need the interactivity: the final purpose of the PDF is to be printed.
With Document, I create new pages using a new Rectangle that uses the lowest value of the dimensions of the existing page as the width and the highest value as the height. This way, the page will always be in portrait. Note that I use the method getPageSizeWithRotation() to make sure I get the correct width and height, taking into account any possible rotation.
I then add a PdfImportedPage to the direct content of the writer. I use the isPortrait() method to find out if I need to rotate the page or not. Observe that the isPortrait() method looks at the page size without taking into account the rotation. If we did take into account the rotation, we'd rotate pages that don't need rotating.
The resulting PDF can be found here: pages_changed.pdf
As you can see, some information got lost: there was an annotation on the final page: it's gone. There were specific viewer preferences defined for the original document: they're gone. But that shouldn't matter in your specific case, because all that matters for you is that the pages are printed correctly.
I'm generating a PDF with iText, in that I'm displaying a header and footer.
Now i want to remove header for a particular page.
For eg: If I'm generating a 50 pages pdf, for the final 50th I don't want to show header,
how could this be achieved?
Here's my code where I'm generating footer (header part removed).
public class HeaderAndFooter extends PdfPageEventHelper {
public void onEndPage (PdfWriter writer, Document document) {
Rectangle rect = writer.getBoxSize("art");
switch(writer.getPageNumber() % 2) {
case 0:
case 1:
ColumnText.showTextAligned(writer.getDirectContent(),
Element.ALIGN_CENTER, new Phrase(String.format("%d", writer.getPageNumber())),
300f, 62f, 0);
break;
}
}
}
Any suggestions? Thanks in advance.
You can use a 2-pass approach:
1st pass : generate the PDF file without header
2nd pass : stamp the header on all but the last page
Have a look at this example taken from the iText book. You'll just have to adapt the second pass by only going through the N-1 first pages:
int n = reader.getNumberOfPages() - 1;
instead of
int n = reader.getNumberOfPages();
I was also in need to do the same. I want to share how I resolved this issue.
The Idea is, for the automatic generation of header footer, we set page event on PDFWriter like:
HeaderAndFooter event= new HeaderAndFooter(); //HeaderAndFooter is the implementation of PdfPageEventHelper class
writer.setPageEvent(event);// writer is the instance of PDFWriter
So, before the content of the last page, We can remove the event:
event=null;
writer.setPageEvent(event);
It works for me without any error or exception.