How can I renderize a PDF into BMP fitting content to PDF page boundaries? - pdf

I am getting a BMP from a PDF with GhostScript, but its content is not fitted into page boundaries. Even I try any option, I am not able to get the content fitted.
I've tried to generate the BMP with different GhostScript options, but noone seems to fit 100% ok the content.
This is the last command I tried. Please, don't expect it to have what I need, just copied & paste from tty.
gs -dBATCH -dNOPAUSE -sPAPERSIZE=a4 -dFIXEDMEDIA -dPSFitPage -sDEVICE=bmpmono -sOutputFile=Betlem.bmp -g1184x968 -c "<</PageSize [900 500]>> setpagedevice 0 0 translate" -c "<</PageOffset [-23 -100]>> setpagedevice" -f Betlem.pdf
I am expecting to get the content fitted into the BMP image borders, without exception of a pixel. I am using an OpenCV & Python function to extract content and fit in new image and this is the debug:
initial BMP image resolution = (872, 900)
BMP image resolution after fit content into new page = (541, 870)
Have a look to the following thread for the fitting funtion in Python:
I can't find a way to fit contour on new image zero point

You are using PSFitPage for a PDF file, you should be using PDFFitPage or just FitPage.
Note that the 'fitting' in this case is fitting the PDF media size to the existing media. If the PDF content leaves white space around the edge of the media, then the resulting scaling will include that.
In addition you are using PostScript to offset the page origin, which will introduce white space, and you are trying to change the media size, which won't work because you've set -dFIXEDMEDIA. Using these in combination with any of the FitPage switches is not likely to work well.
Randomly stabbing at controls and copying bits of code intended to solve different problems isn't likely to help you I'm afraid.
Without seeing an example file I can't, of course, tell you how to solve your problem, and I'm not really sure exactly what you are trying to achieve. A bitmap with no white space ? A bitmap of a given size with no white space ? Something else ?
[Edit]
OK so looking at the PDF file, the media box is 11.69x8.27 inches, there is white space at the top, bottom, left and right between the marks on the page and the edge of the media.
Running this through Ghostscript, to TIFF at 72 dpi results in a file which Adobe Photoshop says is 11.694x8.264 inches and has white space at top bottom left and right, just like the PDF file.
By default Ghostscript uses the Media size from the PDF to render to, however you can change this. If you were to change the media size to (say) 5.8x4.14 inches, set -dFIXEDMEDIA and then rendered the PDF file what would happen is that the top and right hand side of the PDF file would be 'off the page' so you would only get the left hand portion rendered. Try this:
gs -DEVICEWIDTHPOINTS=421 -dDEVICEHEIGHTPOINTS=298 -dFIXEDMEDIA "A betlem m en vull anar(1).pdf"
You will see the white space is still present at bottom and left, and the top and right have fallen off the page.
Now, if you add FitPage that will scale the original media down until it fits the new media size (and all the content too, of course). If you try:
gs -DEVICEWIDTHPOINTS=421 -dDEVICEHEIGHTPOINTS=298 -dFIXEDMEDIA -dFitPage "A betlem m en vull anar(1).pdf"
You'll see that the output is the same physical dfimensions as the previous command, but now the whole of the PDF content can be seen because its been scaled down. You should also see that the distribution of white space has changed, because I didn't strictly divide by 2 in each direction. The FitPage switch scaled the content in both directions by the same amount, and distributed the extra space in the x direction evenly to each side, as new white space.
Now I've no clue what you mean by 'simmetric'. You can undoubtedly do what you want using Ghostscript and the PostScript language, but I don't know what it is you want. Pointing me at Python code isn't going to help I'm afraid, I don't speak Python.
I can say that Ghostscript does not add extra white space that isn't present in the original unless you mess with the rendering by addding parameters like FitPage and FIXEDMEDIA.
If you can explain what you are trying to achieve I can probably tell you what to do.

Related

Ghostscript add white background image

I have a script which automatically adds a gutter to a PDF file. It adds gutter to left for ODD numbered pages and gutter to the right for EVEN numbered pages. It does this by moving the existing image over.
Here is the code for that:
'gs -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -o output.pdf \
-dDEVICEWIDTHPOINTS=513 \
-dDEVICEHEIGHTPOINTS=738 -dFIXEDMEDIA -c \
"<< /CurrPageNum 1 def /Install { /CurrPageNum CurrPageNum 1 add def CurrPageNum 2 mod 1 eq \
{-4.5 0 translate} {4.5 0 translate} \
ifelse } bind >> setpagedevice" -f input_file.pdf
I've found that when I send this PDF file to the printer, the additional space is not "counting" so the file is now narrower now. I think this is because transparency doesn't count on the PDF, and so when sent to the printer the pages are seen as narrower.
Is it possible to add a white background to the pdf so it ISN'T seen as transparent? Or is there an alternative way to fix this?
I'm afraid your assumption is flawed, your 'translate' has no transparency involvement at all, its shifting the content on the media (NB this is not an image, ie a bitmap, in general. Its more complex content). All the content is shifted, no matter whether it is transparent or not.
I'm afraid I can't follow what you mean about the printed page being 'narrower'. The Media request will be for a page 513x738 points, which is a really weird size; 7.125 by 10.25 inches. Unles that matches the page size of your printer, then its going to do 'something' with the result. Probably it will center it if the media is larger than the request, but if the media is smaller than requested, then it will either scale it down or crop it. Either will result in something different to what you expect.
Is there a reason you are changing the media size of the original PDF file ?
If the media request does match the printer then its still possible that there will be cropping or scaling going on, because the printable area may not be the same as hte size of the media. The paper handling of some printers means that they cannot print all the way to the edge of the media. In that case the printer may scale or crop the output again.
You can easily elimiate transparency as being the culprit by simply starting with a test file which does not contain any transparency. If you aren't certain then one solution owuld be to use a recent version of Ghostscript and use the pdfimage32 device. That will create a PDF file from the original PDF, but the output file will only contain a bitmap image, no transparency at all.
To help us consider the problem, it would be helpful to see the original PDF file, the PDF file you send to the printer, and a scan or photograph of the final printed page. It would also be useful to know the version of Ghostscript you are using, the make and model of the printer, and how you are sending the PDF file to the printer.

How do you create a PDF for Kindle Direct Publishing with wkhtmltopdf

Kindle Direct Publishing AKA Create Space, wants a PDF/X-1a in 6x9 format with 0.25" outside margins and 0.375" inside/gutter margins, which I need help with, since Qt PDF generator does not do inside and out, so I have to set them both for the largest, and I need to know if my css effects this, if so how, but setting --margin-left .375in --margin-right .375in gives me this error: Currently all margin units must be the same, not sure what that means, why have a left and right if they must both be the same, and does this really have to apply to top and bottom, what is the thinking, so I added it to top and bottom just to make the file, but it is not what I wanted for margins, I wonder if gs can fix this?
If so how.
I know that wkhtmltopdf currently only creates PDF version 1.4, and Kindle does not seem to mind that much on upload, I do not have a published upload yet, so I hope someone has and knows this from experience, because I do not know if they will accept that yet, so I also use Ghost Script to convert that to PDF version 1.7, this is what I have currently:
PDF_Combine is a bash array of files:
PDF_Combine=("file1.html" "file2.html");
Update: Now KDP wants .875in margins, on both sides my content is real small, how dose CSS effect Margins in a PDF, can I set the Margins to 0 in wkhtmltopdf and adjust them in my CSS, if so how, in the body?
wkhtmltopdf --margin-left .375in --margin-right .375in --margin-bottom .375in --margin-top .375in --page-width 6in --page-height 9in --load-error-handling ignore --javascript-delay 3333 --enable-forms --footer-center "[page]/[topage]" "${PDF_Combine[#]}" "/MyPath/MyFileName.pdf"
The Ghost Script:
gs -dPDFA -dBATCH -dNOPAUSE -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile="/MyPath/MyOutPutFileName.pdf" "/MyPath/MyInPutFileName.pdf"
PDF/X-1a is a highly specific restricted set of PDF features; for instance you cannot use the RGB colour model when producing a PDF/X-1a
You also need specific keys to be present in the Info dictionary of the PDF file.
Ghostscript does not create PDF/X-1a files. It can create PDF/X-3 files, but that's no good to you.
You can process a PDF file using Ghostscript to add white space around the page, what you need to do is specify a larger media size, and tell Ghostscript its a fixed size, so the PDF file can't change it. Then you need to offset the content up and right on the new media (because otherwise the content will be rendered in the bottom left corner). Note; its not clear to me if you want to increase the media size, or reduce the content so that it fits into the required media, but has margins.
See the answer and comments on the question here.

Using Ghostscript to change page size from A4 and other to Letter

I recently asked this question about changing paper size and have a command that scales properly most of time:
gs -sDEVICE=pdfwrite -sOutputFile= $outFile -dBATCH -dNOPAUSE -q -dDEVICEHEIGHTPOINTS=792 -dDEVICEWIDTHPOINTS=612 -dPDFFitPage -dFIXEDMEDIA $inFile
We need to have it generate output pages that are US Letter. I have two problem files I can post. One needs extra whitespace; the other just needs a very small rescaling but it gets a wildly different output.
The first file is an A4 file. The command scales it to a height of 792 and the width is scaled to 559.667. It's scaled accurately, but we need whitespace either on both sides or on the right. How can I modify my command (or run a second command) to do this?
The second file is 8.52" x 11.02". For this one I get an output file that is 549.127 x 709.8 pts and I don't get it at all. 0.02" is within our printing tolerances so we can let it go, but a) I'd rather have just one process b) maybe the issue isn't the small scaling adjustments and maybe it will be a problem for other files.
These, along with your previous question, are all related to the myriad different Boxes available in a PDF file, and the various ways that a PDF processor will deal with the available Boxes.
For your first file; the output of pdfwrite has a MediaBox of 612x792 but a CropBox of [26.1662903 0 585.83374 792.0] which is (of course) not Letter. Its the result of scaling A4 down to Letter and centring that scaled down area on the Letter media.
If you remove the CropBox (using a binary editor) and open the file in Acrobat you will see that the white space is evenly distributed left and right of the page.
So really its up to your printing process. Either you need to tell that process to use the MediaBox and ignore the CropBox, or have it centre the CropBox on the media when the CropBox is not the same as the media.
Your second file has a MediaBox of 684x864 which is 9.5x12 inches. However it has a TrimBox which is [33.8027 33.7217 647.533 827.028] doing the arithmetic that works out is 8.524x11.018 inches.
Clearly Acrobat (or whatever you are using to get the size) is using the TrimBox, not the MediaBox. Ghostscript uses the MediaBox by default, if you want it to use a different *Box then you haver to tell it so. Try adding -dUseTrimBox to your command line. See my answer to your previous question :-)

Import vector graphics from PDF to GIMP

I need to extract vector graphics from a PDF image and import them into GIMP, either as paths or as high-resolution raster images. Specifically, I need to get contour lines from USGS topographical maps and overlay them on satellite images. Any suggestions?
So far I have tried:
--Using GIMP's native PDF importing function to import them as raster images. Problem: To do so at high resolution crashes my computer. Possible solution would be to import only a selected area of a PDF, but as far as I can tell this is not possible.
--Using ImageMagick to convert the PDF to a raster image. Problem: Used with the "-scale" parameter, "convert" appears to rasterize the PDF and then upscale it, leading to a choppy image.
--Using InkScape to extract the necessary vector elements from the PDF. Problem: InkScape freezes when I try to open a moderately large (25 Mb) PDF.
Any other ideas?
Many thanks,
treacl
The option you didn't mention above is to try to use the ghostscript program directly to render your output - ghostscript is used internally by GIMP to import PDF files, so you likely have it installed already.
There are tens of command switches to pass ghostscript for it to render a file into another format - the switches you need to pass are for determining the output size, resolution and which page to print. I didn't find any switch to select a portion of the page to be rendered - so, if your document is a single page, it is possible the generated file will still be to big for GIMP - but you will likely be able to crop it with ImageMagick, at least.
I guess the relevant command line for you would be something along:
gs -dNOPAUSE -dBATCH -sDEVICE=png16m -sOutputFile=page.png -dFirstPage=<pagenumber> -dLastPage=<pagenumber> -r<dpiresolution> -f<filename.pdf>
If the resulting image is still too large to be generated or operated upon, you can try changing the output format to use a smaller color depth (this one is 3 bytes per pixel: png16m) . It should be possible to pass postscript commands to transform the device, so that the area of interest is scaled up to your page size (and the remaining parts are cropped out of the rendering) - that would be the definitive fix for you - but of the top of my head, I don't know how to do that with ghostscript.
Alternatively, you can try passing ImageMagick the -density parameter as suggested in the comments.

Quality degradation of a text pdf after pdf>png>pdf

I have a very specific requirement where i must automatically stamp every page of a PDF file (for a faxing application), so here's the process i've made:
step 1: Convert PDF to PNG, one png file per page
cmd1: gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -r400 -sOutputFile=image_raw.png input.pdf
cmd2: mogrify -resize 31.245% image_raw.png
input.pdf (input): https://www.dropbox.com/s/p2ajqxe99nc0h8m/input.pdf
image_raw.png (output): https://www.dropbox.com/s/4cni4w7mqnmr0t7/image_raw.png
step 2: Stamp every PNG file (using a third party tool ..)
image_stamped.png (output): https://www.dropbox.com/s/3ryiu1m9ndmqik6/image_stamped.png
step 3: Reconvert PNG files into one PDF file
cmd: convert -resize 1240x1753 -units PixelsPerInch -density 150x150 image_stamped.png output.pdf
output.pdf (output): https://www.dropbox.com/s/o9y0jp9b4pm08ci/output.pdf
The output file of the third step shal be "theoretically" the same as the input file in step 1 (plus the stamp on it) but it's not, the file is somehow blurry and it turns to be unreadeable for humans after faxing it since blurred pixels wouldnt pass through fax wires even if you may see no difference between input.pdf and output.pdf, try zooming in and you'll find that text characters are blurred on its edges.
What is the best parameters to play with at input (step 1) or output (step 3) ?
Thanks !
You are using anti-aliasing (TextAlphaBits=4). This 'smooths' the edges of text by introducing grey pixels between the black pixels of the text edges. At low resolutions (such as displays) this prevents the 'jaggies' in text and gives a more readable result. At higher resolutions its value is highly debatable.
Fax is a 1-bit monochrome medium, so the grayscale values have to be recreated by dithering. As you have discovered, this is not a good idea in a limited resolution device as it leads to a loss of sharpness.
I believe that if you remove the -dTextAlphaBits=4 you will see an immediate improvement. I would also suggest that you remove the GraphicsAlphaBits as well, since this will have the same effect on linework.
If you believe that you still want anti-aliasing you could try reducing the aggressiveness, you currnetly have it set to 4, try reducing it to 2.
Regarding the other comments;
Kurt is quite correct, as is fourat, and I'm afraid MarcB is mistaken, the -r400 sets the resolution for rendering, in dots per inch. If only one number is given it is used for both x and y resolution. It is possible to produce a fixed size raster using Ghostscript, but you use the -dFIXEDMEDIA with -sPAPERSIZE switches or the -g switch which also sets FIXEDMEDIA automatically.
While I do agree with yms and Kurt that converting the PDF to a bitmap format (PNG) and then back to PDF will result in a loss of quality, if the final PDF is only used for transmission via fax, it doesn't matter. The PDF must be rendered to a fax-resolution bitmap at some point in the process, its not a big problem if its done before the stamp is applied.
I don't agree with BitBank here, converting a vector representation to bitmap means rasterising it at a particular resolution. Once this is done, the resulting image cannot be rescaled without loss of quality, whereas the original vector representation can be as it is simply rendered again at a different resolution. Image in PDF refers to a bitmap, you can't have a vector bitmap. The image posted by yms clearly shows the effect of rendering a vector representation into an image.
One last caveat. I'm not familiar with the other tools being used here, but two of the command lines at least imply 'resize'. If you 'resize' a bitmap then the chances are that the tool will introduce the same kinds of artefacts (anti-aliasing) that you are having a problem with. Onceyou have created the bitmap you should not alter it at all. Its important that you create the PNG at the correct size in the first place.
And finally.....
I just checked your original PDF file and I see that the content of the page is already an image. Not only that its a DCT (JPEG) image. JPEG is a really poor choice of format for a monochrome image. Its a lossy compression format and always introduces artefacts into the image. If you open your original PDF file in Acrobat (or similar viewer) and zoom in, you can see that there are faint 'halos' around the text, you will also see that the text is already blurry.
You then render the image, quite probably at a different resolution to the original image resolution, and at the same time introduce more blurring by setting -dGraphicsAlphaBits. You then make further changes to the image data which I can't comment on. In the end you render the image again, to a monochrome bitmap. The dithering required to represent the grey pixels leads to your text being unreadable.
Here are some ways to improve this:
1) Don't convert text into images like this, it instantly leads to a quality loss.
2) Don't compress monochrome images using JPEG
3) If you are going to work with images, don't keep converting them back and forth, work with the original until you are done, then make a PDF file from that, if you really must.
4) If you really insist on doing all this, don't compound the problem by using more anti-aliasing. Remove the -dGraphicsAlphaBits from the command line. You might as well remove -dTextAlphaBits as well since your files contain no text. Please read the documentation before using switches and understand what it is you are doing.
You should really think about your workflow here. Obviously we don't know what you are doing or why, so there may well be good reasons why some things are not possible, but you should try and avoid manipulating images like this. Because these are not vector, every time you make a change to the image data you are potentially losing information which cannot be recovered at a later stage. By making many such transformations (and your workflow as depicted seems to perform as many as 5 transformations from the 'original' image data) you will unavoidably lose quality.
If possible retain everything as vector data. When it is unavoidable to move to image data, create the image data as you need it to be finally used, do not transform it further.
I've had a closer look at the files you provided, see here:
So, already the first image (image_raw), the result of the mogrify resize command, is fairly blurry at 1062x1375. While the blurriness does not get worse in the second image (image_stamped) which is the result of the third-party tool, the third image (extracted from your output.pdf), i.e. the result of that convert command, is even more blurred which is due to the graphic being resized (which is something you explicitly tell it to do).
I don't know at which resolution your fax program works, but there is more quality loss still, at least due to 24 bit colors to black-and-white transformation.
If you insist on the work flow (i.e. pdf->png->stamped png->pdf->fax) you should
in the initial rasterization already use the per-inch resolution your rastered image will have in all following steps (including fax transmission),
refrain from anti-aliasing and use of alpha bits (cf. KenS' answer), and
restrict the rasterized image to the colorspace available to the fax transmission, i.e. most likely black-and-white.
PS As KenS pointed out, already the original PDF is merely a container for an image (with some blur to start with). Therefore, an alternative way to improve your workflow is to extract that image instead of rendering it, to stamp that original image and only resize it (again without anti-aliasing) when faxing.