How to convert existing pdf files to A4 size using pdfbox? - pdf

I want to set a size(A4) to an existing document.
I am using pdfbox for watermarking. I used the following link to add watermark. Here I am using another file in which watermark text is there. Latter we are only adding this layer as overlay to original file.
Here the problem arises when file with watermark text is with different size than original document to which the watermark is to be added. In those case the watermark is not getting added properly in terms of position.
Version: I am using pdfbox 1.8. I tried with 2.0 but I am more comfortable with this version.
Here is the code
PDDocument originalPdfFile = PDDocument.load(filename);
PDRectangle pdRect=new PDRectangle(595, 842);//Here I am setting height and width in terms of points
List PageList = originalPdfFile.getDocumentCatalog().getAllPages();
int noOfPages=PageList.size();
System.out.println("No of pages in original document="+noOfPages);
PDPage page=new PDPage();
//PDPage page=new PDPage(PDPage.PAGE_SIZE_A4);
//Here also I tried to add page size
for (int i = 0; i < PageList.size(); i++) {
page=(PDPage)PageList.get(i);
System.out.println("Original Document size in page before cropping: "+(i+1)+", Page Resolution: "+page.getMediaBox());
page.setMediaBox(pdRect);
System.out.println("Original Document size in page after cropping: "+(i+1)+", Page Resolution: "+page.getMediaBox());
//System.out.println("Original Document size in page: "+i+", Height: "+page.getMediaBox().getHeight()+",Width: "+page.getMediaBox().getWidth());
PDRectangle rec=page.getMediaBox();
generateWatermarkText(organisationName,rec);
}
HashMap<Integer, String> overlayGuide = new HashMap<Integer, String>();
for(int i=0; i<originalPdfFile.getNumberOfPages(); i++)
{
overlayGuide.put(i+1, "C:/drm/final/final.pdf");
//watermarktext.pdf is the document which is a one page PDF with your watermark image in it.
}
Overlay overlay = new Overlay();
overlay.setInputPDF(originalPdfFile);
overlay.setOutputFile(filename);
overlay.setOverlayPosition(Overlay.Position.FOREGROUND);
overlay.overlay(overlayGuide,false);
//pdf will have the original PDF with watermarks.
The above code add watermark successfully but I am not able to shrink the page.
This line
PDRectangle pdRect=new PDRectangle(595, 842);
crops the page but it cuts the contains of the page, which I don't want. I want the contains but to should be fit in that page and the page should be of specified size(like A4 in my case).

Related

ImageMagick.Net - convert pdf to tiff

I am running into an issue when converting from pdf to tiff. Here is the code I used (based on a sample provided in the documentation):
private void convImageMx(string pdfFile)
{
var settings = new MagickReadSettings();
// Settings the density to 300 dpi will create an image with a better quality
settings.Density = new Density(300, 300);
settings.ColorType = ColorType.TrueColor;
string tifpath = Path.GetDirectoryName(pdfFile) + "\\" + Path.GetFileNameWithoutExtension(pdfFile);
using (var images = new MagickImageCollection())
{
// Add all the pages of the pdf file to the collection
images.Read(pdfFile, settings);
var page = 1;
foreach (var image in images)
{
// Write page to file that contains the page number
image.Format = MagickFormat.Ptif;
image.Crop(image.Width, image.Height);
image.Write(tifpath + "_p_" + page + ".tif");
page++;
}
}
}
When I provide a multiple pdf as input, I get multiple tiff files - one file per page. However, each file contains 7 pages which are shrinking images of the original page and the size is very large (original pdf size is 328k, the size of one tiff is 67mb!).
I think I need to set the compression property as well as crop property correctly. But did not find any documentation with .NET.
[EDIT] I commented the line with density so that the size issue is fixed. However, the repeating images is still an issue.

decrease font size on exisiting text in pdf

I have a huge pdf containing >1000 pages, need to edit existing text, offcourse it is added by me using pdfbox addtext to each page example ... the text font size was very big text runs out of page..
now i want to decrease the size of font so that it will be within page limits... or i can clear the existing text and replace a the same text with new font...
credits to Tilman Hausherr for answer
If you used the code you linked to, then you will find the "added message" in the content stream array, as the second last item. PDPage.getCosObject().getItem(COSName.Contents) and save the file.
public void removeStamp(File src) throws IOException {
PDDocument doc = PDDocument.load(src);
PDPageTree pages = doc.getPages();
for (PDPage page : pages) {
COSArray array = ((COSArray) page.getCOSObject().getItem(COSName.CONTENTS));
array.remove(array.size() - 1);
}
doc.save(src);
}

Increase left margin of an existing pdf using iTextSharp [duplicate]

My web application signs PDF documents. I would like to let users download the original PDF document (not signed) but adding an image and the signers in the left margin of the pdf document.
I've seen this idea in another web application, and I would like to do the same. Of course I would like to do it using itext library.
I have attached two images, the original PDF document (not signed) and the modified PDF document.
First this: it is important to change the document before you digitally sign it. Once digitally signed, these changes will break the signature.
I will break up the question in two parts and I'll skip the part about the actual watermarking as this is already explained here: How to watermark PDFs using text or images?
This question is not a duplicate of that question, because of the extra requirement to add an extra margin to the right.
Take a look at the primes.pdf document. This is the source file we are going to use in the AddExtraMargin example with the following result: primes_extra_margin.pdf. As you can see, a half an inch margin was added to the left of each page.
This is how it's done:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
int n = reader.getNumberOfPages();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
// properties
PdfContentByte over;
PdfDictionary pageDict;
PdfArray mediabox;
float llx, lly, ury;
// loop over every page
for (int i = 1; i <= n; i++) {
pageDict = reader.getPageN(i);
mediabox = pageDict.getAsArray(PdfName.MEDIABOX);
llx = mediabox.getAsNumber(0).floatValue();
lly = mediabox.getAsNumber(1).floatValue();
ury = mediabox.getAsNumber(3).floatValue();
mediabox.set(0, new PdfNumber(llx - 36));
over = stamper.getOverContent(i);
over.saveState();
over.setColorFill(new GrayColor(0.5f));
over.rectangle(llx - 36, lly, 36, ury - llx);
over.fill();
over.restoreState();
}
stamper.close();
reader.close();
}
The PdfDictionary we get with the getPageN() method is called the page dictionary. It has plenty of information about a specific page in the PDF. We are only looking at one entry: the /MediaBox. This is only a proof of concept. If you want to write a more robust application, you should also look at the /CropBox and the /Rotate entry. Incidentally, I know that these entries don't exist in primes.pdf, so I am omitting them here.
The media box of a page is an array with four values that represent a rectangle defined by the coordinates of its lower-left and upper-right corner (usually, I refer to them as llx, lly, urx and ury).
In my code sample, I change the value of llx by subtracting 36 user units. If you compare the page size of both PDFs, you'll see that we've added half an inch.
We also use these coordinates to draw a rectangle that covers the extra half inch. Now switch to the other watermark examples to find out how to add text or other content to each page.
Update:
if you need to scale down the existing pages, please read Fix the orientation of a PDF in order to scale it

Splitting at a specific point in PDFBox

I would like to split to generate a new pdf by concatenating certain individual pages, but the last page has to be split at a certain point (i.e. all content above a limit to be included and everything below to be excluded - I only care about the ones having their upper left corner above a line). Is that possible using PDFbox?
One way to achieve the task, i.e. to split a page at a certain point (i.e. all content above a limit to be included and everything below to be excluded) would be to prepend a clip path.
You can use this method:
void clipPage(PDDocument document, PDPage page, BoundingBox clipBox) throws IOException
{
PDPageContentStream pageContentStream = new PDPageContentStream(document, page, true, false);
pageContentStream.addRect(clipBox.getLowerLeftX(), clipBox.getLowerLeftY(), clipBox.getWidth(), clipBox.getHeight());
pageContentStream.clipPath(PathIterator.WIND_NON_ZERO);
pageContentStream.close();
COSArray newContents = new COSArray();
COSStreamArray contents = (COSStreamArray) page.getContents().getStream();
newContents.add(contents.get(contents.getStreamCount()-1));
for (int i = 0; i < contents.getStreamCount()-1; i++)
{
newContents.add(contents.get(i));
}
page.setContents(new PDStream(new COSStreamArray(newContents)));
}
to clip the given page along the given clipBox. (It first creates a new content stream defining the clip path and then arranges this stream to be the first one of the page.)
E.g. to clip the content of a page along the horizontal line 650 units above the bottom, do this:
PDPage page = ...
PDRectangle cropBox = page.findCropBox();
clipPage(document, page, new BoundingBox(
cropBox.getLowerLeftX(),
cropBox.getLowerLeftY() + 650,
cropBox.getUpperRightX(),
cropBox.getUpperRightY()));
For a running example look here: ClipPage.java.

iTextSharp rotated PDF page reverts orientation when file is rasterized at print house

Using iTextSharp I am creating a PDF composed of a collection of existing PDFs, some of the included PDFs are landscape orientation and need to be rotated. So, I do the following:
private static void AdjustRotationIfNeeded(PdfImportedPage pdfImportedPage, PdfReader reader, int documentPage)
{
float width = pdfImportedPage.Width;
float height = pdfImportedPage.Height;
if (pdfImportedPage.Rotation != 0)
{
PdfDictionary pageDict = reader.GetPageN(documentPage);
pageDict.Put(PdfName.ROTATE, new PdfNumber(0));
}
if (width > height)
{
PdfDictionary pageDict = reader.GetPageN(documentPage);
pageDict.Put(PdfName.ROTATE, new PdfNumber(270));
}
}
This works great. The included PDFs rotated to portrait orientation if needed. The PDF prints correctly on my local printer.
This file is sent to a fulfillment house, and unfortunately, the landscape included files do not print properly when going through their printer and rasterization process. They use Kodak (Creo) NexRip 11.01 or Kodak (Creo) Prinergy 6.1. machines. The fulfillment house's suggestion is to: "generate a new PDF file after we rotate pages or make any changes to a PDF. It is as easy as exporting out to a PostScript and distilling back to a PDF."
I know iTextSharp doesn't support PostScript. Is there another way iTextSharp can rotate included PDFs to hold the orientation when rasterized?
First let me assure you that changing the rotation in the page dictionary is the correct procedure to achieve what you want. As far as I can see your code, there's nothing wrong with it. You are doing the right thing.
Unfortunately, you are faced with a third party product over which you have no control that is not doing the right thing. How to solve this?
I have written an example called IncorrectExample. I have named it that way because I don't want it to be used in a context that is different from yours. You can safely ignore all the warnings I added: they are not meant for you. This example is very specific to your problem.
Please try the following code:
public void manipulatePdf(String src, String dest)
throws IOException, DocumentException {
// Creating a reader
PdfReader reader = new PdfReader(src);
// step 1
Rectangle pagesize = getPageSize(reader, 1);
Document document = new Document(pagesize);
// step 2
PdfWriter writer
= PdfWriter.getInstance(document, new FileOutputStream(dest));
// step 3
document.open();
// step 4
PdfContentByte cb = writer.getDirectContent();
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
pagesize = getPageSize(reader, i);
document.setPageSize(pagesize);
document.newPage();
PdfImportedPage page = writer.getImportedPage(reader, i);
if (isPortrait(reader, i)) {
cb.addTemplate(page, 0, 0);
}
else {
cb.addTemplate(page, 0, 1, -1, 0, pagesize.getWidth(), 0);
}
}
// step 4
document.close();
reader.close();
}
public Rectangle getPageSize(PdfReader reader, int pagenumber) {
Rectangle pagesize = reader.getPageSizeWithRotation(pagenumber);
return new Rectangle(
Math.min(pagesize.getWidth(), pagesize.getHeight()),
Math.max(pagesize.getWidth(), pagesize.getHeight()));
}
public boolean isPortrait(PdfReader reader, int pagenumber) {
Rectangle pagesize = reader.getPageSize(pagenumber);
return pagesize.getHeight() > pagesize.getWidth();
}
I have taken the pages.pdf file as an example. This file is special in the sense that it has two pages in landscape that are created in a different way:
one page is a page of which the width is smaller than the height (sounds like it's a page in portrait), but as there's a /Rotate value of 90 added to the page dictionary, it is shown in landscape.
the other page isn't rotated, but it has a height that is smaller than the width.
In my example, I am using the classes Document and PdfWriter to create a copy of the original document. This is wrong in general because it throws away all interaction. I should use PdfStamper or PdfCopy instead, but it is right in your specific case because you don't need the interactivity: the final purpose of the PDF is to be printed.
With Document, I create new pages using a new Rectangle that uses the lowest value of the dimensions of the existing page as the width and the highest value as the height. This way, the page will always be in portrait. Note that I use the method getPageSizeWithRotation() to make sure I get the correct width and height, taking into account any possible rotation.
I then add a PdfImportedPage to the direct content of the writer. I use the isPortrait() method to find out if I need to rotate the page or not. Observe that the isPortrait() method looks at the page size without taking into account the rotation. If we did take into account the rotation, we'd rotate pages that don't need rotating.
The resulting PDF can be found here: pages_changed.pdf
As you can see, some information got lost: there was an annotation on the final page: it's gone. There were specific viewer preferences defined for the original document: they're gone. But that shouldn't matter in your specific case, because all that matters for you is that the pages are printed correctly.