Splitting at a specific point in PDFBox - pdf

I would like to split to generate a new pdf by concatenating certain individual pages, but the last page has to be split at a certain point (i.e. all content above a limit to be included and everything below to be excluded - I only care about the ones having their upper left corner above a line). Is that possible using PDFbox?

One way to achieve the task, i.e. to split a page at a certain point (i.e. all content above a limit to be included and everything below to be excluded) would be to prepend a clip path.
You can use this method:
void clipPage(PDDocument document, PDPage page, BoundingBox clipBox) throws IOException
{
PDPageContentStream pageContentStream = new PDPageContentStream(document, page, true, false);
pageContentStream.addRect(clipBox.getLowerLeftX(), clipBox.getLowerLeftY(), clipBox.getWidth(), clipBox.getHeight());
pageContentStream.clipPath(PathIterator.WIND_NON_ZERO);
pageContentStream.close();
COSArray newContents = new COSArray();
COSStreamArray contents = (COSStreamArray) page.getContents().getStream();
newContents.add(contents.get(contents.getStreamCount()-1));
for (int i = 0; i < contents.getStreamCount()-1; i++)
{
newContents.add(contents.get(i));
}
page.setContents(new PDStream(new COSStreamArray(newContents)));
}
to clip the given page along the given clipBox. (It first creates a new content stream defining the clip path and then arranges this stream to be the first one of the page.)
E.g. to clip the content of a page along the horizontal line 650 units above the bottom, do this:
PDPage page = ...
PDRectangle cropBox = page.findCropBox();
clipPage(document, page, new BoundingBox(
cropBox.getLowerLeftX(),
cropBox.getLowerLeftY() + 650,
cropBox.getUpperRightX(),
cropBox.getUpperRightY()));
For a running example look here: ClipPage.java.

Related

How to convert existing pdf files to A4 size using pdfbox?

I want to set a size(A4) to an existing document.
I am using pdfbox for watermarking. I used the following link to add watermark. Here I am using another file in which watermark text is there. Latter we are only adding this layer as overlay to original file.
Here the problem arises when file with watermark text is with different size than original document to which the watermark is to be added. In those case the watermark is not getting added properly in terms of position.
Version: I am using pdfbox 1.8. I tried with 2.0 but I am more comfortable with this version.
Here is the code
PDDocument originalPdfFile = PDDocument.load(filename);
PDRectangle pdRect=new PDRectangle(595, 842);//Here I am setting height and width in terms of points
List PageList = originalPdfFile.getDocumentCatalog().getAllPages();
int noOfPages=PageList.size();
System.out.println("No of pages in original document="+noOfPages);
PDPage page=new PDPage();
//PDPage page=new PDPage(PDPage.PAGE_SIZE_A4);
//Here also I tried to add page size
for (int i = 0; i < PageList.size(); i++) {
page=(PDPage)PageList.get(i);
System.out.println("Original Document size in page before cropping: "+(i+1)+", Page Resolution: "+page.getMediaBox());
page.setMediaBox(pdRect);
System.out.println("Original Document size in page after cropping: "+(i+1)+", Page Resolution: "+page.getMediaBox());
//System.out.println("Original Document size in page: "+i+", Height: "+page.getMediaBox().getHeight()+",Width: "+page.getMediaBox().getWidth());
PDRectangle rec=page.getMediaBox();
generateWatermarkText(organisationName,rec);
}
HashMap<Integer, String> overlayGuide = new HashMap<Integer, String>();
for(int i=0; i<originalPdfFile.getNumberOfPages(); i++)
{
overlayGuide.put(i+1, "C:/drm/final/final.pdf");
//watermarktext.pdf is the document which is a one page PDF with your watermark image in it.
}
Overlay overlay = new Overlay();
overlay.setInputPDF(originalPdfFile);
overlay.setOutputFile(filename);
overlay.setOverlayPosition(Overlay.Position.FOREGROUND);
overlay.overlay(overlayGuide,false);
//pdf will have the original PDF with watermarks.
The above code add watermark successfully but I am not able to shrink the page.
This line
PDRectangle pdRect=new PDRectangle(595, 842);
crops the page but it cuts the contains of the page, which I don't want. I want the contains but to should be fit in that page and the page should be of specified size(like A4 in my case).

Increase left margin of an existing pdf using iTextSharp [duplicate]

My web application signs PDF documents. I would like to let users download the original PDF document (not signed) but adding an image and the signers in the left margin of the pdf document.
I've seen this idea in another web application, and I would like to do the same. Of course I would like to do it using itext library.
I have attached two images, the original PDF document (not signed) and the modified PDF document.
First this: it is important to change the document before you digitally sign it. Once digitally signed, these changes will break the signature.
I will break up the question in two parts and I'll skip the part about the actual watermarking as this is already explained here: How to watermark PDFs using text or images?
This question is not a duplicate of that question, because of the extra requirement to add an extra margin to the right.
Take a look at the primes.pdf document. This is the source file we are going to use in the AddExtraMargin example with the following result: primes_extra_margin.pdf. As you can see, a half an inch margin was added to the left of each page.
This is how it's done:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
int n = reader.getNumberOfPages();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
// properties
PdfContentByte over;
PdfDictionary pageDict;
PdfArray mediabox;
float llx, lly, ury;
// loop over every page
for (int i = 1; i <= n; i++) {
pageDict = reader.getPageN(i);
mediabox = pageDict.getAsArray(PdfName.MEDIABOX);
llx = mediabox.getAsNumber(0).floatValue();
lly = mediabox.getAsNumber(1).floatValue();
ury = mediabox.getAsNumber(3).floatValue();
mediabox.set(0, new PdfNumber(llx - 36));
over = stamper.getOverContent(i);
over.saveState();
over.setColorFill(new GrayColor(0.5f));
over.rectangle(llx - 36, lly, 36, ury - llx);
over.fill();
over.restoreState();
}
stamper.close();
reader.close();
}
The PdfDictionary we get with the getPageN() method is called the page dictionary. It has plenty of information about a specific page in the PDF. We are only looking at one entry: the /MediaBox. This is only a proof of concept. If you want to write a more robust application, you should also look at the /CropBox and the /Rotate entry. Incidentally, I know that these entries don't exist in primes.pdf, so I am omitting them here.
The media box of a page is an array with four values that represent a rectangle defined by the coordinates of its lower-left and upper-right corner (usually, I refer to them as llx, lly, urx and ury).
In my code sample, I change the value of llx by subtracting 36 user units. If you compare the page size of both PDFs, you'll see that we've added half an inch.
We also use these coordinates to draw a rectangle that covers the extra half inch. Now switch to the other watermark examples to find out how to add text or other content to each page.
Update:
if you need to scale down the existing pages, please read Fix the orientation of a PDF in order to scale it

iTextSharp rotated PDF page reverts orientation when file is rasterized at print house

Using iTextSharp I am creating a PDF composed of a collection of existing PDFs, some of the included PDFs are landscape orientation and need to be rotated. So, I do the following:
private static void AdjustRotationIfNeeded(PdfImportedPage pdfImportedPage, PdfReader reader, int documentPage)
{
float width = pdfImportedPage.Width;
float height = pdfImportedPage.Height;
if (pdfImportedPage.Rotation != 0)
{
PdfDictionary pageDict = reader.GetPageN(documentPage);
pageDict.Put(PdfName.ROTATE, new PdfNumber(0));
}
if (width > height)
{
PdfDictionary pageDict = reader.GetPageN(documentPage);
pageDict.Put(PdfName.ROTATE, new PdfNumber(270));
}
}
This works great. The included PDFs rotated to portrait orientation if needed. The PDF prints correctly on my local printer.
This file is sent to a fulfillment house, and unfortunately, the landscape included files do not print properly when going through their printer and rasterization process. They use Kodak (Creo) NexRip 11.01 or Kodak (Creo) Prinergy 6.1. machines. The fulfillment house's suggestion is to: "generate a new PDF file after we rotate pages or make any changes to a PDF. It is as easy as exporting out to a PostScript and distilling back to a PDF."
I know iTextSharp doesn't support PostScript. Is there another way iTextSharp can rotate included PDFs to hold the orientation when rasterized?
First let me assure you that changing the rotation in the page dictionary is the correct procedure to achieve what you want. As far as I can see your code, there's nothing wrong with it. You are doing the right thing.
Unfortunately, you are faced with a third party product over which you have no control that is not doing the right thing. How to solve this?
I have written an example called IncorrectExample. I have named it that way because I don't want it to be used in a context that is different from yours. You can safely ignore all the warnings I added: they are not meant for you. This example is very specific to your problem.
Please try the following code:
public void manipulatePdf(String src, String dest)
throws IOException, DocumentException {
// Creating a reader
PdfReader reader = new PdfReader(src);
// step 1
Rectangle pagesize = getPageSize(reader, 1);
Document document = new Document(pagesize);
// step 2
PdfWriter writer
= PdfWriter.getInstance(document, new FileOutputStream(dest));
// step 3
document.open();
// step 4
PdfContentByte cb = writer.getDirectContent();
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
pagesize = getPageSize(reader, i);
document.setPageSize(pagesize);
document.newPage();
PdfImportedPage page = writer.getImportedPage(reader, i);
if (isPortrait(reader, i)) {
cb.addTemplate(page, 0, 0);
}
else {
cb.addTemplate(page, 0, 1, -1, 0, pagesize.getWidth(), 0);
}
}
// step 4
document.close();
reader.close();
}
public Rectangle getPageSize(PdfReader reader, int pagenumber) {
Rectangle pagesize = reader.getPageSizeWithRotation(pagenumber);
return new Rectangle(
Math.min(pagesize.getWidth(), pagesize.getHeight()),
Math.max(pagesize.getWidth(), pagesize.getHeight()));
}
public boolean isPortrait(PdfReader reader, int pagenumber) {
Rectangle pagesize = reader.getPageSize(pagenumber);
return pagesize.getHeight() > pagesize.getWidth();
}
I have taken the pages.pdf file as an example. This file is special in the sense that it has two pages in landscape that are created in a different way:
one page is a page of which the width is smaller than the height (sounds like it's a page in portrait), but as there's a /Rotate value of 90 added to the page dictionary, it is shown in landscape.
the other page isn't rotated, but it has a height that is smaller than the width.
In my example, I am using the classes Document and PdfWriter to create a copy of the original document. This is wrong in general because it throws away all interaction. I should use PdfStamper or PdfCopy instead, but it is right in your specific case because you don't need the interactivity: the final purpose of the PDF is to be printed.
With Document, I create new pages using a new Rectangle that uses the lowest value of the dimensions of the existing page as the width and the highest value as the height. This way, the page will always be in portrait. Note that I use the method getPageSizeWithRotation() to make sure I get the correct width and height, taking into account any possible rotation.
I then add a PdfImportedPage to the direct content of the writer. I use the isPortrait() method to find out if I need to rotate the page or not. Observe that the isPortrait() method looks at the page size without taking into account the rotation. If we did take into account the rotation, we'd rotate pages that don't need rotating.
The resulting PDF can be found here: pages_changed.pdf
As you can see, some information got lost: there was an annotation on the final page: it's gone. There were specific viewer preferences defined for the original document: they're gone. But that shouldn't matter in your specific case, because all that matters for you is that the pages are printed correctly.

Cannot stamp certain PDFs

PDF stamping works for nearly every document I have tried. However, a client scanned some pages and his computer generated a PDF document that is resistant to stamping. The embedded image files are in JBIG2 format, but I am not sure if that is important. I have debugged the PDF with Apache's pdfbox, and I can see the text is embedded. It just doesn't show up.
Here is the PDF that won't stamp: http://demo.clearvillageinc.com/plans.pdf
And my code:
static void Main(string[] args) {
string stamp = "<div style=\"color:#F00;\">Reviewed for Code Compliance</div>";
string fileName = #"C:\temp\source.pdf";
string outputFileName = #"C:\temp\source-output.pdf";
// Open a destination stream.
using (var destStream = new System.IO.MemoryStream()) {
using (var sourceReader = new PdfReader(fileName)) {
// Convert the HTML into a stamp.
using (var stampData = FromHtml(stamp)) {
using (var stampReader = new PdfReader(stampData)) {
using (var stamper = new PdfStamper(sourceReader, destStream)) {
stamper.Writer.CloseStream = false;
// Add the stamp stream to the source document.
var stampPage = stamper.GetImportedPage(stampReader, 1);
// Process all of the pages in the source document.
for (int i = 1; i <= sourceReader.NumberOfPages; i++) {
var canvas = stamper.GetOverContent(i);
canvas.AddTemplate(stampPage, 0, -50);
}
}
}
}
}
// Finished. Save the file.
using (var fs = new System.IO.FileStream(outputFileName, FileMode.Create)) {
destStream.Position = 0;
destStream.CopyTo(fs);
}
}
}
public static System.IO.Stream FromHtml(string html) {
var ms = new System.IO.MemoryStream();
// Convert html to pdf.
using (var document = new iTextSharp.text.Document()) {
var writer = iTextSharp.text.pdf.PdfWriter.GetInstance(document, ms);
writer.CloseStream = false;
document.Open();
using (var sr = new System.IO.StringReader(html)) {
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, sr);
}
}
ms.Position = 0; // Reset for reading.
return ms;
}
One part of a page definition is the "MediaBox" which controls the page's size. This property takes two locations that specify the coordinates of two opposite corners of a rectangle. Although not required, most PDFs specify the lower left corner first followed by the upper right corner. Also, most PDF use 0x0 for the lower left and then whatever the page's width and height for the top corner. So an 8.5x11 inch PDF would be 0,0 and 612,792 (8.5 * 72 = 612 and 11 * 72 = 792) and this would be written as 0,0,612,792.
Your scanned PDF, however, has for whatever reason decided to treat 0,7072 as the lower left corner and 614,7864 as the top right corner. That still gives us (almost) an 8.5x11 page size but if you try to draw something at 0,0 it will be 7,072 pixels below the actual page. You can see this in Acrobat Pro by zooming out very far (1% for me), picking Tools, Edit Object and then doing a Select All. You should see something way far down selected, too.
To get around this, you need to respect the page's boundaries.
for (int i = 1; i <= sourceReader.NumberOfPages; i++) {
//Get the page to be stamped
var pageToBeStamped = sourceReader.GetPageSize(i);
var canvas = stamper.GetOverContent(i);
//Offset our new page by 50 pixels off of the destination page's bottom
canvas.AddTemplate(stampPage, pageToBeStamped.Left, pageToBeStamped.Bottom - 50);
}
The code above gets the rectangle for the imported page and uses bottom offset by 50 pixels (from your original code). Also, although not a problem in your case, we use the imported page's actual left edge instead of just zero.
This code can still break, however. The math in the first paragraph uses 72 which is the default for PDFs but this can be changed. Most people don't change it but most people also don't change 0,0. Currently your -50 assumes the 72 which gives the visual perception of moving the stamp about seven-tenths of an inch from the top edge. If you run into this scenario you'll want to look into retrieving the user unit.
Also, as I said in the first paragraph, most applications use lower left upper right but this isn't a hard rule. Someone could specify upper right and bottom left or even top left and bottom right. This is a hard one to take into account but it is something that you should at least be aware of.

How to exactly position an Image inside an existing PDF page using PDFBox?

I am able to insert an Image inside an existing pdf document, but the problem is,
The image is placed at the bottom of the page
The page becomes white with the newly added text showing on it.
I am using following code.
List<PDPage> pages = pdDoc.getDocumentCatalog().getAllPages();
if(pages.size() > 0){
PDJpeg img = new PDJpeg(pdDoc, in);
PDPageContentStream stream = new PDPageContentStream(pdDoc,pages.get(0));
stream.drawImage(img, 60, 60);
stream.close();
}
I want the image on the first page.
PDFBox is a low-level library to work with PDF files. You are responsible for more high-level features. So in this example, you are placing your image at (60, 60) starting from lower-left corner of your document. That is what stream.drawImage(img, 60, 60); does.
If you want to move your image somewhere else, you have to calculate and provide the wanted location (perhaps from dimensions obtained with page.findCropBox(), or manually input your location).
As for the text, PDF document elements are absolutely positioned. There are no low-level capabilities for re-flowing text, floating or something similar. If you write your text on top of your image, it will be written on top of your image.
Finally, for your page becoming white -- you are creating a new content stream and so overwriting the original one for your page. You should be appending to the already available stream.
The relevant line is:
PDPageContentStream stream = new PDPageContentStream( pdDoc, pages.get(0));
What you should do is call it like this:
PDPageContentStream stream = new PDPageContentStream( pdDoc, pages.get(0), true, true);
The first true is whether to append content, and the final true (not critical here) is whether to compress the stream.
Take a look at AddImageToPDF sample available from PDFBox sources.
Try this
doc = PDDocument.load( inputFileName );
PDXObjectImage ximage = null;
ximage = new PDJpeg(doc, new FileInputStream( image )
PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get(0);
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
contentStream.drawImage( ximage, 425, 675 );
contentStream.close();
This prints the image in first page. If u want to print in all pages just put on a for loop with a condition of number of pages as the limit.
This worked for me well!
So late answer but this is for who works on it in 2020 with Kotlin: drawImage() is getting float values inside itself so try this:
val file = File(getPdfFile(FILE_NAME))
val document = PDDocument.load(file)
val page = document.getPage(0)
val contentStream: PDPageContentStream
contentStream = PDPageContentStream(document, page, true, true)
// Define a content stream for adding to the PDF
val bitmap: Bitmap? = ImageSaver(this).setFileName("sign.png").setDirectoryName("signature").load()
val mediaBox: PDRectangle = page.mediaBox
val ximage: PDImageXObject = JPEGFactory.createFromImage(document, bitmap)
contentStream.drawImage(ximage, mediaBox.width - 4 * 65, 26f)
// Make sure that the content stream is closed:
contentStream.close()
// Save the final pdf document to a file
pdfSaveLocation = "$directoryPDF/$UPDATED_FILE_NAME"
val pathSave = pdfSaveLocation
document.save(pathSave)
document.close()
I am creating a new PDF and running below code in a loop - to add one image per page and below co-ordinates and height and width values work well for me.
where out is BufferedImage reference variable
PDPage page = new PDPage();
outputdocument.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(outputdocument, page, AppendMode.APPEND, true);
PDImageXObject pdImageXObject = JPEGFactory.createFromImage(outputdocument, out);
contentStream.drawImage(pdImageXObject, 5, 2, 600, 750);
contentStream.close();
This link gives you details about Class PrintImageLocations.
This PrintImageLocations will give you the x and y coordinates of the images.
Usage: java org.apache.pdfbox.examples.util.PrintImageLocations input-pdf