ghostscript downsampling of pdf images, downsample factor error

ghostscript downsampling of pdf images, downsample factor error - pdf

I issue the following command:
gs \
-o downsampled.pdf \
-sDEVICE=pdfwrite \
-dDownsampleColorImages=true \
-dColorImageResolution=180 \
-dColorImageDownsampleThreshold=1.0 \
And get the following errors:
Subsample filter does not support non-integer downsample factor (1.994360)
Failed to initialise downsample filter, downsampling aborted
(on some pages)
and:
Subsample filter does not support non-integer downsample factor (2.000029)
Failed to initialise downsample filter, downsampling aborted
Originally I tried to downsample to 150dpi, which gave the error with factor (2.40????), meaning multiple errors, where the last few digits are different for different pages. So I guessed that images are approximately 150*2.4 = 360 dpi. So I try downsampling to 180. But it seems the images are all slightly off?
Is there a way to specify the factor instead of the dpi?
Is there a way to "round" the factor?

No, there is no way to specify the factor (this is the Adobe specification for distiller params, we are currently limited to those). You cannot specify an approximation for rounding either, without modifying the source code.
You can use a different downsampling algorithm.
[much later]
In fact I just checked the current code, and you must be using an old version of Ghostscript.
The current default downsampling filter is the Bicubic filter, and if you do force the Subsample filter, then the code checks to see if the downsample factor requested is an integer.
If the factor is not an integer but is within 0.1 of an integer then it forces factor to the nearest integer.
If its outside 0.1 of an integer factor then it aborts the subsample filter and switches to Bicubic.
I'd recommend upgrading.
[later edit]
So avoiding the bogus ColorDownsampleOption, the problem is actually not colour images at all, its monochrome images, or more precisely in your case, imagemasks.
I set up this command line:
gs
-sDEVICE=pdfwrite \
-sOutputFile=pdfwrite.pdf \
-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageDownsampleThreshold=1 \
-dGrayImageDownsampleThreshold=1 \
-dMonoImageDownsampleThreshold=1 \
-dColorImageDownsampleType=/Bicubic \
-dGrayImageDownsampleType=/Bicubic \
-dMonoImageDownsampleType=/Bicubic \
-dColorImageResolution=72 \
-dGrayImageResolution=72 \
-dMonoImageResolution=100 "gs sample.pdf"
And that produces an error message that the only filter available for monochrome images is Subsample, followed by the error messages you quote about the imprecise factor.
I guess basically this makes my point that an example file is pretty much vital in order to investigate problems.
So there is a problem there, and I will look into it, obviously for monochrome images it should be clamped to the nearest integer resolution, since no other filter is possible. However, Gray and Colour images do work as expected.
Reporting a bug, as I suggested in an early comment would probably have got to this point much sooner. I'd still suggest you do that, so that this is not overlooked.
You may be interested to note that, for me, the resulting file when I don't downsample monochrome images, but do downsample the others, as per the command line above, is 785KB the original being 2.5MB.

Related

Batch PDF Watermark [PDF -> JPG -> PDF]

I am working with over thousands PDF files for a Sheet Music publisher.
All of these PDF files needs a preview PDF. A watermark for PDF files can easily be removed so I am asking for a true way to watermark our PDF:s in a batch operation.
PDF->Apply Watermark->JPG->Back to PDF
How can I do this? Is there a good tool for this operations?

The free route
ImageMagick can do the complete process for you, especially with the composite command's -watermark operator.
#!/bin/sh
# ImageMagick will pick the correct conversion formats based on filename suffixes, or maybe actual binary content?
InputPDF=$1
WatermarkImg=$2
OutputPDF=$3
pdfToImage=pdfToImage.png
imageWithWatermark=imageWithWatermark.png
# Convert PDF to image
convert \
-density 300 \
-trim \
"$InputPDF" \
-quality 100 \
-flatten \
-sharpen 0x1.0 \
$pdfToImage
# Add watermark to intermediate image
composite \
-dissolve 15 \
-tile \
"$WatermarkImg" \
$pdfToImage \
$imageWithWatermark
# Convert intermediate image back to PDF
convert \
$imageWithWatermark \
"$OutputPDF"
# Clean up
rm $pdfToImage $imageWithWatermark
I find the PDF to image conversion acceptable in terms of quality, though you can see some differences when looking at the before and after side-by-side, especially in how bolded glyphs seem less bold:
You can check this good post and its answers for a number of options for converting a PDF to an image, Convert PDF to image with high resolution.
I checked out PDFtoPPM, which was also highly mentioned in that thread, and I still see some degrading of the bolded fonts when converted:
Some more tiling Magick
I used this copyright symbol from Wikimedia Commons and this ImageMagick script:
#!/bin/sh
Infile="Copyright.png"
Outfile="Copyright_tiled.png"
h2=$(convert $Infile -format "%[fx:round(h/2)]" info:)
convert $Infile \
\( -clone 0 -roll +0+"$h2" \) \
+append \
-write mpr:sometile \
+delete \
-size 1224X1584 \
tile:mpr:sometile \
$Outfile
to create this staggered tiling (1224X1584 is the page size (8.5in x 11in) multiplied by 72 px/in, times 2 for a good density of tiles):

And here it is unwatermarked again
#ZachYoung I used some different image magic, also scriptable, the point is:-
Although "What's done cannot be undone" Macbeth (Act 5.1. 63-4) is very true especially within a PDF or image. We also know and expect that it too applies to any PDF (de)constructs. Thus depending on value of a forgery it will always be worth engineering a partially reversed copy, fit for scrutiny or use, but will like the watermarked copy, still not be the original, however all the same, may look almost just as good.
The Idiom implies don't bother yourself about it. Its best not done in the first place.
The nearest to best, is use a watermark exactly the same as the text outlines, like this:-

Convert PDF/Images to a flip based effect in GIF

I want to convert a PDF or sequence-of-images to a flip based effect in GIF (similar to the one below).
Is there any softwares available handy I could use to produce this output? or do I have to write scripts using imageMagicK? please suggest?
Thanks in advance!

Cool project! This isn't production-ready, military-hardened, bullet-proof code, but the following will do most of the heavy lifting as regards getting set up, appending the individual pages, distorting pages as they turn and finally putting the whole lot together in an animated GIF sequence.
#!/bin/bash
################################################################################
# flipbook
# Mark Setchell
#
# Give me 4 pages as parameters and I create an animated GIF book out of them
# called book.gif.
#
# Requires ImageMagick
################################################################################
# Names of the 4 pages
p0=${1:-page0.gif} # Use first arg, or "page-0.gif" if none given
p1=${2:-page1.gif}
p2=${3:-page2.gif}
p3=${4:-page3.gif}
# Get width and height of images - I assume, but do not check all are identical sizes
read w h < <(convert "$p0" -format "%w %h" info: )
((twow=w+w))
# Layout first and last flat double-page spreads
convert "$p0" "$p1" +append frame0.png
convert "$p2" "$p3" +append frame4.png
# Make right page taller and thinner and save as "distorted.png"
((deltah=20*h/100))
((deltaw=20*w/100))
((hplusdeltah=h+deltah))
((wminusdeltaw=w-deltaw))
((hplus2deltah=h+deltah+deltah))
points="0,0 0,$deltah $wminusdeltaw,0 $wminusdeltaw,0 $wminusdeltaw,$hplus2deltah $wminusdeltaw,$hplus2deltah 0,$hplus2deltah 0,$hplusdeltah"
convert "$p1" +matte -virtual-pixel transparent \
-resize ${wminusdeltaw}x${hplus2deltah}! +repage \
-distort Perspective "$points" +repage distorted.png
# Make second frame by overlaying distorted right page ontop of pages 0 and 3
convert "$p0" "$p3" +append \
-bordercolor white -border 0x$deltah \
+repage \
distorted.png \
-geometry +${w}x \
-composite frame1.png
# Make left page taller and thinner and save as "distorted.png"
((deltaw=70*w/100))
((wminusdeltaw=w-deltaw))
points="0,0 0,0 $wminusdeltaw,0 $wminusdeltaw,$deltah $wminusdeltaw,$hplus2deltah $wminusdeltaw,$hplusdeltah 0,$hplus2deltah 0,$hplus2deltah"
convert "$p2" +matte -virtual-pixel transparent \
-resize ${wminusdeltaw}x${hplus2deltah}! +repage \
-distort Perspective "$points" +repage distorted.png
# Make third frame by overlaying distorted left page ontop of pages 0 and 3
convert "$p0" "$p3" +append \
-bordercolor white -border 0x$deltah \
+repage \
distorted.png \
-geometry +${deltaw}x \
-composite frame2.png
# Make left page taller and thinner and save as "distorted.png"
((deltaw=20*w/100))
((wminusdeltaw=w-deltaw))
points="0,0 0,0 $wminusdeltaw,0 $wminusdeltaw,$deltah $wminusdeltaw,$hplus2deltah $wminusdeltaw,$hplusdeltah 0,$hplus2deltah 0,$hplus2deltah"
convert "$p2" +matte -virtual-pixel transparent \
-resize ${wminusdeltaw}x${hplus2deltah}! +repage \
-distort Perspective "$points" +repage distorted.png
# Make fourth frame by overlaying distorted right page ontop of pages 0 and 3
convert "$p0" "$p3" +append \
-bordercolor white -border 0x$deltah \
+repage \
distorted.png \
-geometry +${deltaw}x \
-composite frame3.png
# Make final animation from frame0.png...frame4.png
convert -gravity center -delay 100 frame*.png -background white -extent ${twow}x${hplus2deltah} book.gif
So, if you start with the following as page0.gif, page1.gif, page2.gif and page3.gif...
You will get this as book.gif
If your book has more than 4 pages, you can do four at a time and then append the animations quite simply.
Updated Answer
It seems you are unfortunate enough to have to use Windows - which is very cumbersome in BATCH. I am no expert, but can get around in BATCH a little. I think the script above is pretty easy to translate though. I'll get you started but you will need to do some yourself - you can always ask a new question if you get stuck - questions are free!
The first part of the script just picks up the parameters supplied on the command line, so it'll look like this:
REM Pick up commandline parameters
set p0=%1
set p1=%2
set p2=%3
set p3=%4
Then we need to work out the width and height of the input images, something like this:
REM Get width and height of images in variable "w" and "h"
FOR /F %%A IN ('identify -format "w=%%w\nh=%%h" %p0%') DO set %%A
All the stuff in my original script inside ((..)) is just simple maths which can be done in BATCH using SET /A, so the lines that look like this:
((twow=w+w))
((deltah=20*h/100))
will look like this:
SET /A TWOW=w+w
SET /A DELTAH=20*h/100
The rest is just convert commands - you will need to do a couple of things there:
Replace line continuations at ends of lines, so change \ to ^
Where I use $variable or ${variable}, replace it with %variable%
Double any % signs I have, so % becomes %%
Change \( to ^( - I think
change any single quotes ' to double quotes "
Best to just work through it and see what happens as you convert each line and ask another question if you can't work it out.
There is some good info at these places - ss64 - general, ss64 - set command on BATCH in general. Also, an English guy called Alan Gibson, uses IM with Windows very competently and you can see his scripts here, and also more generally here for inspiration on how to be effective with IM under Windows.

Ghostscript renders ugly text

I'm trying to add the capability to render LaTeX equations to a project I'm working on. To do so, I use XeLaTeX to create a PDF file, which I then render to a (transparent) 96dpi-PNG using Ghostscript.
I'd like to have the rendered LaTeX blend in with the rest of the text (which is rendered using standard .NET GDI+ methods, but that's off-topic), but I can't get a reliably "good" text rendering: the output always looks somehow blurry or otherwise "bad".
Example:
From left to right, the same (small) PDF rendered at 96dpi with Ghostscript, Photoshop, and TexWorks (which I understand uses Ghostscript internally).
The command I use to run Ghostscript is the following:
"C:/Program Files (x86)/gs/gs9.09/bin/gswin32c.exe" \
-q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT \
-dMaxBitmap=500000000 -dAlignToPixels=1 -dGridFitTT=2 \
"-sDEVICE=pngalpha" -dTextAlphaBits=4 \
-dGraphicsAlphaBits=4 "-r96" -dFirstPage=1 -dLastPage=1 \
-sOutputFile="output.png" "input.pdf"
(which I actually pretty much copied from the command ImageMagick calls when converting a PDF file, but that's another story). I tried changing any of the relevant options (dAlignToPixels=0, dGridFitTT=0/1/2, dTextAlphaBits=2/4 [or without this parameter altogether]) and I even tried to render the PDF to 4 times the resolution and then downscale it, without any noticeable improvement.
Yet, I'm sure there must be some way of decently rendering the PDF with Ghostscript (since TexWorks does), although I'm unable to find it.
Any hint? The PDF is this one.

You could try to render your PDF at a higher resolution. 96dpi just isn't enough for text with 11 pt size.
If you use 192dpi and then scale the display of the resulting image to 50% (wherever you use the PNG), these parts should still appear in the same size as befor, but with a higher resolution. What used to be a 4x7 pixels 's' should now be a 8x14 pixels 's'...
Update
Ok, since my explanation seems to have been not comprehendible enough for the OP, here's the deal.
Generate a PDF file containing the word "Test", using Ghostscript. In my case, it is Ghostscript v9.10:
gs \
-o test.pdf \
-sDEVICE=pdfwrite \
-g230x100 \
-c "/Helvetica findfont \
11 scalefont \
setfont \
1 1 moveto \
(Test) show \
showpage"
From this PDF, generate 6 different images depicting the word "Test", using 6 different resolutions. The gs is still Ghostscript v9.10 (to be checked with gs -version):
for i in 1 2 3 4 5 6; do \
gs \
-o t$(( ${i} * 96 )).png \
-r$(( ${i} * 96 )) \
-sDEVICE=pngalpha \
-dAlignToPixels=1 \
-dGridFitTT=2 \
-dTextAlphaBits=4 \
-dGraphicsAlphaBits=4 \
t.pdf ; \
done
This will create the following PNGs, as confirmed by ImageMagick's identify command:
identify -format "%f : %Wx%H pixels -- %b filesize\n" t[1-9]*.png
t96.png : 31x13 pixels -- 475B filesize
t192.png : 61x27 pixels -- 774B filesize
t288.png : 92x40 pixels -- 1.1KB filesize
t384.png : 123x53 pixels -- 1.43KB filesize
t480.png : 153x67 pixels -- 1.76KB filesize
t576.png : 184x80 pixels -- 2.01KB filesize
Create a sample LaTeX document and embed the different images side by side and/or line by line. Here is my sample code:
\begin{document}
Test
\includegraphics[height=7.5pt]{t96.png}
\includegraphics[height=7.5pt]{t96.png}
\includegraphics[height=7.5pt]{t192.png}
\includegraphics[height=7.5pt]{t288.png}
\includegraphics[height=7.5pt]{t384.png}
\includegraphics[height=7.5pt]{t480.png}
\includegraphics[height=7.5pt]{t576.png}
Test\\
{}
Test <== real text
\includegraphics[height=7.5pt]{t96.png} <-- 96 dpi figure
\includegraphics[height=7.5pt]{t192.png} <-- 192 dpi figure
\includegraphics[height=7.5pt]{t288.png} <-- 288 dpi figure
\includegraphics[height=7.5pt]{t384.png} <-- 384 dpi figure
\includegraphics[height=7.5pt]{t480.png} <-- 480 dpi figure
\includegraphics[height=7.5pt]{t576.png} <-- 576 dpi figure
Test <== real text
\end{document}
Here is a screenshot (at 400% zoom) from the PDF created via LuaLaTeX from the above LaTeX code:
The line with the 8 "Test" words has actual text only in the first and the last word. The 6 words in between are images with 96, 96, 192, 288, 384, 480 and 576 dpi.
I hope you can see now clearly how scaling up your image generation to a higher resolution will result in better quality for your final PDF if you include the higher resolution images into your LaTeX code...

You are rendering text at 11 points, at 96 dpi, that works out to about 14 pixels in height which, frankly, is not a lot (and in my output the 's' is 7 pixels high by 4 wide). Looking at your output all 3 look 'blurry' and the Photoshop output looks overly bold in the capital T.
If you don't want it blurred, then don't set TextAlphaBits, or don't set it to such a high value.
I'd also suggest using the current release (9.15).

Can iTextSharp reduce dpi resolution of a pdf?

I'm trying to upload hi-res PDF files to our servers, but would like to generate a smaller PDF file size so that it loads quickly on my web application by reducing the dpi resolution.
Is this something that iTextSharp can do? Or is there another free software that can achieve this?

PDF files, in general, do not have DPI. Raster images embedded in a PDF file do. What you can do, is to extract the images embedded in your PDF file, resize them to a lower resolution, and put them back in your file.
There is a chapter about this topic in the book iText in Action.

Ghostscript is Free Software (if you want), and it can downsample PDFs any way you want (well, downsample the pixel images that may be embedded on its pages).
Example commandline, which downsamples all images to 72dpi (provided they have a resolution that's more than 144dpi). I'll not use the shortest command, but I deliberately try to enumerate all potentially useful parameters, so that you can experiment:
gs \
-o downsampled.pdf \
-sDEVICE=pdfwrite \
-dColorImageDownsampleThreshold=2.0 \
-dGrayImageDownsampleThreshold=2.0 \
-dMonoImageDownsampleThreshold=2.0 \
-dColorImageDownsampleType=/Bicubic \
-dGrayImageDownsampleType=/Bicubic \
-dMonoImageDownsampleType=/Bicubic \
-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageResolution=72 \
-dGrayImageResolution=72 \
-dMonoImageResolution=72 \
-dAutoFilterColorImages=false \
-dAutoFilterGrayImages=false \
\
-dEncodeColorImages=true \
-dEncodeGrayImages=true \
-dEncodeMonoImages=true \
-dColorImageFilter=/DCTEncode \
-dGrayImageFilter=/DCTEncode \
-dMonoImageFilter=/CCITTFaxEncode \
input.pdf
If you want to downsample all color images (that is, also the ones from 73dpi to 144dpi), then use -dColorImageDownsampleThreshold=1.0 (Ghostscript's default is =1.5); the same goes for other *ImageDownsampleThreshold settings.
For the *ImageDownsampleTypes -- you can also experiment with values of /Average or /Subsample instead of my suggested /Bicubic. And you are of course als free to use different settings for resolution, sampling type and thresholds across the mono, gray and color image types.

Convert multipage PDF to PNG and back (Linux)

I have a lot of PDF documents that I want to convert to PNG, edit in Gimp, and then save back to the multipage Acrobat file. I'm filling out forms and adding scanned signature, trying to avoid printing, signing, then scanning back in, with the ability to type the information I need to enter.
I've been trying to use Imagemagick to convert to png files, which seems to work fine. I use the command convert -quality 100 -density 300x300 multipage.pdf single%d.png
(I'm not really sure if the quality parameter is right for png).
But I'm having problems with saving back to PDF. Some of the files have the wrong page size, and I've tried every command and procedure I can find, but there are always a few odd sizes. The resolution seems to vary so that it looks good at a certain zoom level, but either a few pages are specified at about 2" wide, or they are 8.5x11 but the others are about 35" wide. I've tried making sure Gimp had the canvass size and resolution correct, and to save the resolution in the file, but that doesn't seem to matter.
The command I use to save the files is convert -page letter -adjoin single*.png multipage.pdf I've tried other parameters, but none seemed to matter.
If anyone has any ideas or alternatives, I'd appreciate it.

"I'm not really sure if the quality parameter is right for PNG."
For PNG output, the -quality setting is very unlike JPEG's quality setting (which simply is an integer from 0 to 100).
For PNG it is composed by two single digits:
The first digit (tens) is (largely) the zlib compression level, and it may go from 0 to 9.
(However the setting of 0 has a special meaning: when you use it you'll get Huffman compression, not zlib compression level 0. This is often better... Weird but true.)
The second digit is the PNG data encoding filter type (before it is compressed):
0 is none,
1 is "sub",
2 is "up",
3 is "average",
4 is "Paeth", and
5 is "adaptive".
In practical terms that means:
For illustrations with solid sequences of color a "none" filter (-quality 00) is typically the most appropriate.
For photos of natural landscapes an "adaptive" filtering (-quality 05) is generally the best.
"I'm having problems with saving back to PDF. Some of the files have the wrong page size, and I've tried every command and procedure I can find [...] but either a few pages are specified at about 2" wide, or they are 8.5x11 but the others are about 35" wide."
Not having available your PNG files, I created a few simple ones with different dimensions to verify the different commands (as I wasn't sure myself any more). Indeed, the one you used:
convert -page letter -adjoin single*.png multipage.pdf
does create all PDF pages in (same) letter size, but it places my sample of (differently sized) PNGs always on the lower left corner of the PDF page. (Should a PNG exceed the PDF page size, it does scale them down to make them fit -- but it doesn't scale up smaller PNGs to fill the available page space.)
The following modification to the command will place the PNGs into the center of each PDF page:
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
multipage.pdf
If this is still not good enough for you, you can enforce a (possibly non-proportional!) scaling to almost fill the letter area by adding a -scale '590!x770!' parameter (this will leave a border of 11 pt at each edge of the page):
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
-scale '590!x770!' \
multipage.pdf
To leave away the extra border, use -scale '612!x792!'. -- Should you want only upward scaling to happen if required while keeping the aspect ratio of the PNG, use -scale '590<x770<':
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
-scale '590<x770<' \
multipage.pdf

Why not just use Xournal? That's what I use to annotate PDFs

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas