Cropping a region from a PDF page with PDFBox - pdf

I am trying to crop a region out of a PDF page programmatically. Specifically, my input is going to be a single page PDF and a bounding box on the page. Output is going to be a PDF that contains the characters, graphics paths and images from the original PDF, and it should look like the original PDF. In other words, I want a function that is similar to cropping a region out of an image, but with PDFs.
Three questions:
Is it at all possible to do? From my knowledge of PDFs, it seems possible. But I'm no expert, so I would like to know first if there are some things I'm missing here.
Is there any open source software for this?
Can PDFBox do this currently? I couldn't find such a functionality but I might have missed it. Does anybody know of any attempt of doing this?

1- Yes, this is called the crop box.
2- Yes, e.g. PDFBox.
3- Yes, just open a PDF, set a crop box, and save it:
PDDocument doc = PDDocument.load(new File(...));
PDPage page = doc.getPage(0);
page.setCropBox(new PDRectangle(20, 20, 200, 400));
doc.save(...);
doc.close();
The numbers in PDRectangle are user space units. 1 unit = 1/72 inches.
Note that the contents outside the cropbox are not gone, they are just hidden.

Related

PDFBox generate so blacked line when I zoom out

When I try to print lines using PDFBox, it creates line so blacked when I zoom out generated pdf file.
I'm creating a dashed pattern using content stream with line methods (moveTo, lineTo). For dash pattern and setting specific size I use methods (lineWidth, setLineDashPattern).
You can see code on my github repo (https://github.com/dmmax/pdfbox-dotted-pattern/blob/master/src/main/java/me/dmmax/pdfbox/dottedpattern/Main.java)
Below picture with opened two files: my result (left side) and example how it should look like (right side). Zoom of both files is 50%.
Or you can check on your computer, just download two files:
1) My result: https://github.com/dmmax/pdfbox-dotted-pattern/blob/master/print.pdf
2) Example: https://github.com/dmmax/pdfbox-dotted-pattern/blob/master/informationyoushouldknow.pdf
Does anyone know how to fix blacked lines when I zoom out result pdf?
Thank a lot to #TilmanHausherr with his big help in this question.
If you have so blacked line(-s) in zoom out of pdf then this happens because pdf render a lot of small objects but in zoom out size have the same (or close to it) size.
For me resolve this problem is generate dot/dash pattern (with needed count of lines) in another pdf and after that I convert pdf to XObject and print on my current pdf.
Yes it takes up more space, but there are no blackouts

Printing pdf document on paper with predifned layout

We need to print a pdf document on the page which has predefined fields on it, a formular basically, which fields needs to be filled.
We are using iTextSharp to create pdfs and we use absoulte positioning for elements based on the formular fields positioning. For instance, if the field starts 20mm from left and 20 mm from top I will put data to start at 21mm from the left and 21 mm from top so it fits inside that field. And it works well on my printer.
But my question is, can different printers mess up positioning because of different margins, font sizes, etc... Maybe it will be the same, I am not aware of what differences can different printers bring.
Is it important that user chooses Actual size option when printing pdf?
I need to know what difficulties I can expect, better to know it now then waiting customers calling when this is in production.
The problem you anticipate, exist. It can be avoided by setting a viewer preference.
See How to prevent the resizing of pages in PDF?
You have to set the print scaling to none:
writer.addViewerPreference(PdfName.PRINTSCALING, PdfName.NONE);
That's the line you'll need if you are using iText 5 (writer is an instance of PdfWriter). If you are using iText 7, you can define the viewer preferences like this:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfViewerPreferences preferences = new PdfViewerPreferences();
preferences.setPrintScaling(PdfViewerPreferencesConstants.NONE);
pdf.getCatalog().setViewerPreferences(preferences);
See Handling events; setting viewer preferences and printer properties.
Of course, end users can always overrule the print scaling in their PDF viewer, but that's their responsibility, not yours.

move PDF content using PDFBox

I need to be able to specify a rectangular area on a PDF page and move the text and graphic content of that area to a new location on the same page using PDFBox. Any graphics (lines, pictures, etc) will each move as a whole unit if selected in the area.
The PDF documents being modified originate as text based PCL and are converted to PDF using a third party tool. I can answer technical questions about these documents if needed.
This Stack Overflow question is exactly what I am after but that question seems to have been abandoned before a working solution was found?
I would bounty this question if I had a few more reputation points.
If you can help with any aspect of this issue I would appreciate your assistance, thank you.
I'm not as familiar with PDFBox as I should be but any library should be able to do the following; I know the one I represent can.
Create a new blank page that's the same size as your original. Copy the content of the original to an XObject and apply that to the blank page. Add a white rectangle to the page to obscure the rectangle in question. Clip the content of the original page to the rectangle you want to "move". Create a second XObject from that. Apply it to the new page in the position you want.
If PDFBox is capable of it, Sanitize the new page to remove the hidden content under the white box.

iText: why would adding an image cause text to appear fuzzy in PDF?

I'm using iText with Java to create a PDF file. I'm trying to place a paragraph on left, and float an image on right (e.g. next to each other). Using the following code does insert the image, but it also makes the text fuzzy on the entire page (other pages are fine).
// add image
Image img = Image.getInstance(imgPath);
img.setAlignment(Image.RIGHT | Image.TEXTWRAP);
img.scaleToFit(1000, 72f); // 1" height
//img.setSpacingBefore(0f); // does not have any effect
document.add(img);
// add text
Paragraph par = new Paragraph("some text here", styleBody);
par.setSpacingBefore(20f);
document.add(par);
If I remove the image portion of the code, the text looks clean. This is my first attempt at adding an image next to text. Must be doing something obviously wrong. Any idea what could cause this?
I was able to solve this problem. The code above is perfectly fine. The problem was I was using a PNG image with transparency. When I removed the transparency (by re-exporting the image from Illustrator with transparency turned off), I was able to create PDFs with clear text.
I think the transparency forces the PDF page to be written in CMYK color scheme rather than RGB, which perhaps causes this issue.
Hope this helps someone else. I searched everywhere but couldn't find any leads talking about fuzzy text in iText.

Relative crop with Ghostscript

I want to crop all pages by 1 inch, as with Adobe Acrobat:
but I can't find such switch in ghostscript reference.
Is there an easy way to crop pages relatively by 1 inch without knowing page dimension upfront?
Short answer; no.
Note that the dialog you have posted doesn't actually crop pages per se, it sets the CropBox of the PDF file. Ghostscript's pdfwrite device does NOT 'edit' PDF files, so you cannot simply alter the CropBox of the original PDF file using Ghostcript.
Long answer; What you can do is create a brand new PDF file which should look the same as the original, and you can create that PDF file with a different CropBox. Now the PDF interpreter must know the MediaBox (and other boxes) from the PDF file, because, obviously, it needs to know how big the original is. Which means that you can write PostScript to alter that.
But doing this isn't simple, especially not if you expect each page in the file to have a different MediaSize, and hence a different CropBox.