Change size PDF unproportionally by Ghostscript - pdf

I have PDF document with many pages 595x420 ppi but I need this pages push in 595x210 but all text must be visible.
So.. Can I change scale of PDF pages unproportionally (no zoom) to fit custom size of page with ghostscript or I must to use some another program?

If you want scaling applied to one axis and not the other, then you will have to do some PostScript programming. In /ghostpdl/Resource/Init/pdf_main.ps is the code which calculates the matrix required:
/pdf_PDF2PS_matrix { % <pdfpagedict> -- matrix
matrix currentmatrix matrix setmatrix exch
% stack: savedCTM <pdfpagedict>
dup get_any_box
% stack: savedCTM <pdfpagedict> /Trim|Crop|Art|MediaBox <Trim|Crop|Art|Media Box>
oforce_elems normrect_elems fix_empty_rect_elems 4 array astore
//systemdict /PDFFitPage known {
PDFDEBUG { (Fiting PDF to imageable area of the page.) = flush } if
That code calculates the x and y scale values and makes them the same. If you want them to differ, that's what you will have to modify. Note you will also have to set a specific media size using -dDEVICEHEIGHTPOINTS and -dDEVICEWIDTHPOINTS and set -dFIXEDMEDIA to prevent the PDF file resizing the media.

Related

pdf2image converts page with page information that is not visible in the pdf viewer

I convert PDF to image using pdf2image which is python package.
But in result, PDF page information(?), which is not visible on pdf viewer, is appeared.
How can i remove page information on PDF, not on image?
PDF file link is https://1drv.ms/b/s!Ar1AW_VI_HwvkMAOyDmQhFEKrZnRWg?e=fvWEwN
The PDF Page data/information for viewing has been constrained by a "crop box" or "trim box" which in most cases would be identical to the paper "media box" However when using crop marks for printing or display the crop box area will be smaller than the media box area.
pdf2image has a setting to cover the use of crop boxes use_cropbox=True, (normal default is False) so in your invocation you would need to set that argument/option
use_cropbox
Uses the PDF cropbox instead of the default mediabox. This is a rather dark feature that should be set to true when the module does not seem to work with your data.
However looking into the file the values have been altered from expected so a source page is defined as
<< /CropBox [ 0 0 676 855] /MediaBox [ 0 0 676 856]...
thus there would be no noticeable difference, the 1 unit is only 1/72"
But 48 pages have later additional (LaTeX ?) crop box values of
<</CropBox[32.4 32.4 643.6 823.6]... and this seems to effect the issue of the trimmed viewport.
pdfinfo filename.pdf reports the cropped area Page size: 611.2 x 791.2 pts (letter)
For that reason (there are two conflicting settings) :-
Then without a working pdf2image set-up for testing, I am not 100% confident that the use_cropbox=True setting may always work reliably.
There are other methods that might work better and Ghostscript and other python dependency applications have similar, or alternate, means to clip the image output directly on the file. Using poppler direct we could get the same default output
However if we specify -cropbox the secondary crop, in this case, will be taken into account.
pdftoppm -png -cropbox "process data sheet.pdf" output
If that did not work we would need to define the exact area using
-x <int> : x-coordinate of the crop area top left corner
-y <int> : y-coordinate of the crop area top left corner
-W <int> : width of crop area in pixels (default is 0)
-H <int> : height of crop area in pixels (default is 0)

Images rotated when added to PDF in itext7

I'm using the following extension method I built on top of itext7's com.itextpdf.layout.Document type to apply images to PDF documents in my application:
fun Document.writeImage(imageStream: InputStream, page: Int, x: Float, y: Float, width: Float, height: Float) {
val imageData = ImageDataFactory.create(imageStream.readBytes())
val image = Image(imageData)
val pageHeight = pdfDocument.getPage(page).pageSize.height
image.scaleAbsolute(width, height)
val lowerLeftX = x
val lowerLeftY = pageHeight - y - image.imageScaledHeight
image.setFixedPosition(page, lowerLeftX, lowerLeftY)
add(image)
}
Overall, this works -- but with one exception! I've encountered a subset of documents where the images are placed as if the document origin is rotated 90 degrees. Even though the content of the document is presented properly oriented underneath.
Here is a redacted copy of one of the PDFs I'm experiencing this issue with. I'm wondering if anyone would be able to tell me why itext7 is having difficulties writing to this document, and what I can do to fix it -- or alternatively, if it's a potential bug in the higher level functionality of com.itextpdf.layout in itext7?
Some Additional Notes
I'm aware that drawing on a PDF works via a series of instructions concatenated to the PDF. The code above works on other PDFs we've had issues with in the past, so com.itextpdf.layout.Document does appear to be normalizing the coordinate space prior to drawing. Thus, the issue I describe above seems to be going undetected by itext?
The rotation metadata in the PDF that itext7 reports from a "good" PDF without this issue seems to be the same as the rotation metadata in PDFs like the one I've linked above. This means I can't perform some kind of brute-force fix through detection.
I would love any solution to not require me to flatten the PDF through any form of broad operation.
I can talk only about the document you`ve shared.
It contains 4 pages.
/Rotate property of the first page is 0, for other pages is 270 (defines 90 rotation counterclockwise).
IText indeed tries to normalize the coordinate space for each page.
That`s why when you add an image to pages 2-4 of the document it is rotated on 270 (90 counterclockwise) degrees.
... Even though the content of the document is presented properly oriented underneath.
Content of pages 2-4 looks like
q
0 -612 792 0 0 612 cm
/Im0 Do
Q
This is an image with applied transformation.
0 -612 792 0 0 612 cm represents the composite transformation matrix.
From ISO 32000
A transformation matrix in PDF shall be specified by six numbers,
usually in the form of an array containing six elements. In its most
general form, this array is denoted [a b c d e f]; it can represent
any linear transformation from one coordinate system to another.
We can extract a rotation from that matrix.
How to decompose the matrix you can find there.
https://math.stackexchange.com/questions/237369/given-this-transformation-matrix-how-do-i-decompose-it-into-translation-rotati
The rotation is defined by the next matrix
0 -1
1 0
This is a rotation on -90 (270) degrees.
Important note: in this case positive angle means counterclockwise rotation.
ISO 32000
Rotations shall be produced by [rc rs -rs rc 0 0], where rc = cos(q)
and rs = sin(q) which has the effect of rotating the coordinate system
axes by an angle q counter clockwise.
So the image has been rotated on the same angle in the counter direction comparing to the page.

How to get DPI of a PDF file?

Using ImageMagick or GhostScript or any PHP code how can I get the DPI value of PDF files?
Here is the link for two demo files
http://jmp.sh/O5g5wL4 -- of 72 DPI
http://jmp.sh/RxrnYrY -- of 300 DPI
I have used
$image = new Imagick();
$image->readImage('xyz.pdf');
$resolutions = $image->getImageResolution();
It gives the same result for two different PDF files having different DPI.
I have also used
pdfimages -list xyz.pdf
It gives a list of all information but how to fetch the DPI value from the list.
How to get the exact DPI value of a PDF?
As fmw42 says PDF files themselves have no resolution. However in your case both the files consist of nothing but an image. In one case the image is ~48 MB and in the other its around 200 MB.
The reason is that the images have a different effective resolution.
In PDF the image is simply a bitmap, a sequence of coloured pixels. These are then drawn onto the underlying media. At this point there is no resolution, the pixels are laid down in a specific media size. In your case 22 inches by 82 inches.
The effective resolution is given by dividing the dimension by the number of pixels in the image in that dimension.
So if I have an image which is 1000x1000 pixels, and I draw it in a 1 inch square, then the effective resolution of the image is 1000 dpi. If I change my mind and draw it in a square 4 inches by 4 inches, then the effective resolution is 250 dpi.
The image hasn't changed, just the area it covers.
Now consider I have two images drawn in 1 inch squares. the first image is 1000x1000, the second is 500x500. The effective resolution of the first image is 1000 dpi, the effective resolution of the second is 500 dpi.
So you can see that, in PDF, the effective resolution of the image is a combination of the dimensions of the image, and the dimensions of the media it covers.
That's a difficult thing to measure in a PDF file. The area covered is calculated using matrix algebra and can be a combination of several different matrices.
The actual dimensions of the image, by contrast are quite easy to determine, they are given in the image dictionary. Your images are: 1620x5868 and 3372x12225. In both cases the media is the same size; 22.5x81.5 inches.
Since the images cover the entire media, the effective resolutions are;
1620/22.5 = 72 by 5868/81.5 = 72
3372/22.5 = 149.866 by 12225/81.5 = 150
I think MuPDF will give you image dimensions and media dimensions, assuming all your PDF files are constructed like this you can then simply perform the maths, but note that this won't be so simple for ordinary PDF files where images don't cover the entire media.
Using mutool info -I -M 150-dpi.pdf gives:
Retrieving info from pages 1-1...
Mediaboxes (1):
1 (6 0 R): [ 0 0 1620 5868 ]
Images (1):
1 (6 0 R): [ DCT ] 3375x12225 8bpc DevCMYK (12 0 R)
So there's your image dimensions and your media size. All you need to do is apply the division of one by the other.
Note: In debian and related distros, mutool is contained in mupdf-tools package, not in mupdf package itself. It can by therefore installed by sudo apt install mupdf-tools.
I use pdfimages -list from the poppler library, gives you all the information about the images.

Resizing multi-page mixed-format PDF with Ghostscript?

I have multi-page PDF-files with mixed formats A4 (portrait) - A0 (landscape).
Is Ghostscript capable of resizing the pages with size >A3 to A3 – but leaving the pages with smaller size (A4) not to be resized?
First, Ghostscript doesn't do manipulations of the input, you should read ghostpdl/doc/vectordevices.htm to see how Ghostscript and the pdfwrite device actually work.
Out of the box, no Ghostscript and the pdfwrite device won't allow you to produce output with differently sized media from the input, and different for each page (you can have it produce output sized to a single media size). It can, of course, be done, but will involve some programming, and in PostScript at that.
You would probably want to look at the pdf_PDF2PS_matrix routine in ghostpdl/Resource/Init/pdf_main.ps:
% Compute the matrix that transforms the PDF->PS "default" user space
/pdf_PDF2PS_matrix { % <pdfpagedict> -- matrix
...
Which calculates the scale factors required when resizing content to fit the media.
Also pdfshowpage_setup :
/pdfshowpage_setpage { % <pagedict> pdfshowpage_setpage <pagedict>
6 dict begin % for setpagedevice
% Stack: pdfpagedict
...
Which is where the selection of the media size takes place.
After spending long time looking for a solution, I found a great - and yet affordable - tool capable of doing the resizing and a lot more: PStill (http://www.pstill.com/)

Setting auto-height/width for converted Jpeg from PDF using GhostScript

I am using GS to do conversion from PDF to JPEG and following is the command that I use:
gs -sDEVICE=jpeg -dNOPAUSE -dBATCH -g500x300 -dPDFFitPage -sOutputFile=image.jpg image.pdf
In this command as u can see -g500x300 is to set the converted image size (Width x Height).
Is there a way to just set the Width without having to input the Height so it will base on the width to scale the height using its original aspect ratio? I know it can be achieved by using ImageMagick convert where you simply put 0 on the height parameter i.e. -resize 500x0. I tried with GhostScript but I don't think that is the correct way to do it.
I decided not to use ImageMagick convert reason why because it is very slow when it comes to converting a big sized multiple page PDF.
Thanks for the help!
This post explains why ghostscript is faster - https://serverfault.com/questions/167573/fast-pdf-to-jpg-conversion-on-linux-wanted, and the only workaround to fix it would involve modifying the imagemagick code.
Unfortunately, autodetermined output size is not supported by ghostscript. This is primarily because the -g option used is actually determining the device size that will hold the rendered output, and not the rendered output itself. That output size is changing because of the -dPDFFitPage switch which then tries to match the device size. And although you can define just the height of the jpeg 'device' using -dDEVICEHEIGHT=n, that will leave the device width at the unchanged default.
Although a somewhat tedious workaround, you can use ghostscript or imagemagick to get the width and height of the pdf page(s). To do this using ghostscript, see the answer to Using GhostScript to get page size. You can then calculate the proper width to set the -g flag to hold the aspect ratio. Bonus points if you can figure out a single set of commands to do all this :)
You could write a PostScript program to do this readily enough. Here is a start:
%!
% usage: gs -sFile=____.pdf scale.ps
/File where not {
(\n *** Missing source file. \(use -sFile=____.pdf\)\n) =
Usage
} {
pop
}ifelse
% Get the width and height of a PDF page
%
/GetSize {
pdfgetpage currentpagedevice
1 index get_any_box
exch pop dup 2 get exch 3 get
/PDFHeight exch def
/PDFWidth exch def
} def
%
% The main loop
% For every page in the original PDF file
%
1 1 PDFPageCount
{
/PDFPage exch def
PDFPage GetSize
% In here, knowing the desired destination size
% calculate a scale factor for the PDF to fit
% either the width or height, or whatever
% The width and height are stored in PDFWidht and PDFHeight
PDFPage pdfgetpage
pdfshowpage
} for
pdfgetpage and pdfshowpage are internal Ghostscript extensions to the PostScript language for handling PDF files.
To resize image with Ghostscript, use -dDownScaleFactor
e.g.
gs -dBATCH -dNOPAUSE -r300 -dDownScaleFactor=3 -sDEVICE=png16m -sOutputFile=/tmp/26a0e9f7-3f26-437d-9a97-1653074e819a_%d.png,/tmp/temp.pdf
-r300 here will produce a huge image
I can drop the size by scaling down by 3, aspect ratio maintained.
You can use this if it is not important to set an exact width dimension. Which works for most use cases.