ImageMagick - PDF resize without stretching interior content - pdf

I have a PDF of size 8.27in × 10.87in, but the printer at my work requires pages be US Letter size (8.5in x 11in). ImageMagick can achieve the resizing with:
convert -page Letter mypdf.pdf mypdf-converted.pdf
The resulting pdf is 8.5x11 and prints out fine, but the text and images in the pdf are stretched. What I want is exact copies of original 8.27x10.87 pages stuck on 8.5x11 pages with no stretching of the interior content. Ideally the content would be centered, but having the content aligned to one side is fine as well.
I've messed with some of the -filter options, but to no avail. Is there an option for what I'm describing?
Also, as a note: I tried printing the file to a PDF in Document Viewer (the default Linux PDF viewer) and adjusting the page settings, but that caused what I think are weird character encoding issues. For some reason I can view the pdf on my computer fine, but the printer prints out lots of "!"s and "%"s and other random characters in place of the text. The resulting PDF from ImageMagick doesn't cause any problems like this.
Thanks!

You can use our free cpdf tool:
Coherent PDF Command Line Tools Community Release
First, change the size of the pages:
cpdf -mediabox "0 0 8.5in 11in" in.pdf -o out.pdf
Then, shift the contents rightward and upward by half the difference in size:
cpdf -shift "0.115in 0.0515in" out.pdf -o out2.pdf
Or, all together:
cpdf in.pdf -mediabox "0 0 8.5in 11in" AND -shift "0.115in 0.0515in" -o out.pdf

Related

How to make invisible (e.g. OCR) text visible after removing text-images from PDF with Ghostscript

I used gs -o 'out.pdf' -sDEVICE=pdfwrite -dFILTERIMAGE 'in.pdf' to remove all images from some PDF files to minimize their file sizes. Now in some of those PDFs, the result is invisible text, as they only consisted of scanned pages with an invisible OCR layer on top. Is there some way to make that OCR text visible?
The answer is very very dependent on how the OCR was done, here is an exceptionally perfect result sample from AWS-textract (reality is im(g)perfect as it depends on each image)
Several things to note, the colorless text is often not aligned with the real letter positions since character word blocks or lines need to be averaged out, so there is a tendency for lower in most cases even to the point (pun :-) in worst lower cases (pun :-) it looks just as high as under lines (yet another:-) width is often set to 1 point, no stroke, no fill.
When you strip the image then nothing shows
At this juncture you have a few choices, but generally you need to blacken what's left. And cpdf can in some cases do that well, however I had no sucess with using:-
cpdf -blacktext -color black -opacity 1.0 in.pdf -o out.pdf
I had hoped it would do this but alas not today. In fact any command line tool had problems with the "invisible text", except its clearly seen by pdftotext thus could be reprinted as PDF.
The best I could do is use a GUI editor to recolor the text so Inkscape or similar programmable graphics app or API like Acrobat/iText etc. will most likely be needed to change text appearance
The only way to make that text visible would be to edit the text rendering mode in the PDF file and change it from 3 to 0. To do that you would need to edit the actual content of the PDF, which would most probably mean you would have to decompress it, then edit the file looking for "3 Tr" and replacing with "0 Tr".
You can do:
cpdf -remove-all-text in.pdf -o out.pdf

Change a PDF's 0.00pt Lines to a Larger Size

In this PDF, the drawings on the second-to-last page apparently use a 0.00pt line width. This makes them almost unreadable on-screen, and completely invisible when printed.
Is there a relatively painless way to change these "no width" lines to have some width? There are lots of small details, so converting to image will not retain enough detail unless an outlandish resolution is used... then the "no width" issue re-emerges.
I've installed GhostScript, ran pdf2ps in.pdf med.ps then ps2pdf med.ps out.pdf and the line weights are exactly the same. Next, I opened med.ps in a text editor, hoping I could make a python script "find and replace" these zero line widths, but I'm seeing nothing like "0 w" in the file. Perhaps it is defined in a macro somewhere, but I'm not seeing it.
This idea came from Change the width of all lines in a PDF programmatically and Thicken line weights when printing PDF.
Best bet is to use a tool to decompress the PDF file (eg, using MuPDF; mutool -d <in.pdf> <out.pdf> or with Ghostscript gs -sDEVICE=pdfwrite -o out.pdf -dCompressPages=false in.pdf) then use a text editor or some kind of scripting tool such as sed to look for "0 w" and replace wiith 'something else'.
PDF isn't a programming language, unlike PostScript, so you can reliably search for operator usage like this in a PDF file, trying to do the same in a PostScript file is, as beginner6789 says above, extremely hard.
If you want to then have the finak file compressed you could run the edited file through Ghostscript's pdfwrite device using something like gs -sDEVICE=pdfwrite -o final.pdf in.pdf.
You absolutely should not use Ghostscript's ps2write device to producce PostScript; the PostScript imaging model is not entirely compatible with PDF, and any PDF constructs which cannot be represented in PostScript (such as any kind of transparency) will be rendered to an image. Really, don't do this.
This could be a problem if there are a lot of different weights used and you just want to change the 0.0 width lines. If they were all 0.0 then placing this early in the page could work unless the postscript looks in the system dictionaries for the command:
/setlinewidth {pop} def
The default linewidth for my ghostscript is 1.0 so that should be used automatically instead of the 0.0 linewidth.
The pdf2ps usually has a lot of pdf style dictionaries so finding the code used for setlinewidth can be confusing. The setlinewidth must be there someplace. Some people like to read postscript.
Pdf files aren't really meant to be edited so I use these options to make reading the final pdf easier: -dCompressPages=false -dCompressStreams=false just in case there is some useful information to look at in the pdf.
EDIT: depending on the code used to create the original postscript there might be labels like this:
dup/LW//knownget exec{
setlinewidth
}if
/w/setlinewidth load def
So there could be LW or w used for setlinewidth like this simple example. Most are not this simple.
EDIT2: There is some good info here:
How to change the width of lines in a PDF/PostScript file

Ghostscript add white background image

I have a script which automatically adds a gutter to a PDF file. It adds gutter to left for ODD numbered pages and gutter to the right for EVEN numbered pages. It does this by moving the existing image over.
Here is the code for that:
'gs -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -o output.pdf \
-dDEVICEWIDTHPOINTS=513 \
-dDEVICEHEIGHTPOINTS=738 -dFIXEDMEDIA -c \
"<< /CurrPageNum 1 def /Install { /CurrPageNum CurrPageNum 1 add def CurrPageNum 2 mod 1 eq \
{-4.5 0 translate} {4.5 0 translate} \
ifelse } bind >> setpagedevice" -f input_file.pdf
I've found that when I send this PDF file to the printer, the additional space is not "counting" so the file is now narrower now. I think this is because transparency doesn't count on the PDF, and so when sent to the printer the pages are seen as narrower.
Is it possible to add a white background to the pdf so it ISN'T seen as transparent? Or is there an alternative way to fix this?
I'm afraid your assumption is flawed, your 'translate' has no transparency involvement at all, its shifting the content on the media (NB this is not an image, ie a bitmap, in general. Its more complex content). All the content is shifted, no matter whether it is transparent or not.
I'm afraid I can't follow what you mean about the printed page being 'narrower'. The Media request will be for a page 513x738 points, which is a really weird size; 7.125 by 10.25 inches. Unles that matches the page size of your printer, then its going to do 'something' with the result. Probably it will center it if the media is larger than the request, but if the media is smaller than requested, then it will either scale it down or crop it. Either will result in something different to what you expect.
Is there a reason you are changing the media size of the original PDF file ?
If the media request does match the printer then its still possible that there will be cropping or scaling going on, because the printable area may not be the same as hte size of the media. The paper handling of some printers means that they cannot print all the way to the edge of the media. In that case the printer may scale or crop the output again.
You can easily elimiate transparency as being the culprit by simply starting with a test file which does not contain any transparency. If you aren't certain then one solution owuld be to use a recent version of Ghostscript and use the pdfimage32 device. That will create a PDF file from the original PDF, but the output file will only contain a bitmap image, no transparency at all.
To help us consider the problem, it would be helpful to see the original PDF file, the PDF file you send to the printer, and a scan or photograph of the final printed page. It would also be useful to know the version of Ghostscript you are using, the make and model of the printer, and how you are sending the PDF file to the printer.

Converting from pdf to png with ghostscript, result with many white boxes

I'm converting pdf (created with adobe illustrator) into transparent png file, with following command:
gs -q -sDEVICE=pngalpha -r300 -o target.png -f source.pdf
However, there's undesired white boxes in the resulting PNG, looks like it's auto generated by ghostscript, some bounding box. (see attached image)
Tryied both gs-9.05 and gs-9.10, same bad result.
I've tried to export to PNG file from Illustrator or Inkscape manually, the result is good.
What does Inkscape do to render it correct, and
How could I eliminate those white boxes using ghostscript?
Try mudraw of latest (1.3) muPDF, as far as I checked it creates nice PNGs from PDF files with 1.4 transparency:
mudraw -o out.png -c rgba in.pdf
"rgba" being, as you understand, RGB + alpha
In the general case, you can't. PDF does support transparency, but the underlying media is always assumed to be white and opaque. So anywhere that marks are made on the medium is no longer transparent, its white.
You don't say which version of Ghostscript you are using, but if its earlier than 9.10 you could try upgrading.

Using ImageMagick or Ghostscript (or something) to scale PDF to fit page?

I need to shrink some large PDFs to print on an 8.5x11 inch (standard letter) page. Can ImageMagick/Ghostscript handle this sort of thing, or am I having so much trouble because I'm using the wrong tool for the job?
Just relying on the 'shrink to page' option in client-side print dialogs is not an option, as we'd like for this to be easy-to-use for the end users.
I would not use convert. It uses Ghostscript in the background, but is much slower. I'd use Ghostscript directly, since it gives me much more direct control (and also some control over settings which are much more difficult to achieve with convert). And for convert to work for PDF-to-PDF conversion you'll have Ghostscript installed anyway:
gs \
-o /path/to/resized.pdf \
-sDEVICE=pdfwrite \
-dPDFFitPage \
-r300x300 \
-g2550x3300 \
/path/to/original.pdf
The problem with using ImageMagick is that you are converting to a raster image format, increasing file size and decreasing quality for any vector elements on your pages.
Multivalent will retain the vector information of the PDF.
Try:
java -cp Multivalent.jar tool.pdf.Impose -dim 1x1 -paper "8.5x11in" myFile.pdf
to create an output file myFile-up.pdf
ImageMagick's mogrify/convert commands will indeed do the job. Stephen Page had just about the right idea, but you do need to set the dpi of the file as well, or you won't get the job done.
Assuming you have a file that's 300 dpi and already the same aspect ratio as 8.5 x 11 the command would be:
// 300dpi x 8.5 -2550, 300dpi x 11 -3300
convert original.pdf -density "300" -resize "2550x3300" resized.pdf
If the aspect ratio is different, then you need to do some slightly trickier cropping.
The Ghostscript approach worked well for me. (I moved my file from my Windows PC to a Linux computer and ran it there.) I made one small change to the Ghostscript command because the Ghostscript resize command above completely fills an 8.5 by 11 inch page. My printer cannot print to the edge, though, so several milllimeters along each page edge were lost. To overcome that problem, I scaled my PDF document to 0.92 of a full 8.5 by 11 inches. That way I saw everything centered on the page and had a slight margin. Because 0.92 * (2550x3300) = (2346x3036), I ran the following Ghostscript command:
gs -sDEVICE=pdfwrite \
-dPDFFitPage \
-r300x300 \
-g2346x3036 \
/home/user/path/original.pdf \
-o /home/user/path/resized.pdf
If you use Insert > Image... in LibreOffice Writer to insert a PDF, you can use direct manipulation or its Image Properties to resize and reposition the PDF, and when you File > Export as... PDF the PDF remains vectors and text. Interestingly when I did this with a PDF invoice the PDF exported from LO is smaller than the original, but the Linux pdfimages command-line utility suggests LO preserves any raster images within the original PDF.
However, you want something easier-to-use for your end users than the print dialog's "Shrink to page" option. There are tools like Adobe Acrobat that lay out PDFs to form print jobs that are PDFs; I don't know which ones have a simple "Change the bounding box and scale to letter-size". Surprisingly the do-it-all qpdf tool lacks this feature.