Tools to convert multipage PDF to multipage TIFF [closed] - pdf

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Improve this question
I'm writing a small application to convert several multipage PDF's to multipage TIFF files. Per the other questions and answers on this site, I've tried both ghostscript and ImageMagick however both pieces of software only covert the first page when I run them. Are there any other tools I can use to accomplish this, preferably open source ones?

Actually Ghostscript can convert a multi-page PDF into a multi-page TIFF. Just be sure to have a rather recent version of Ghostscript installed. Be careful when examining the resulting TIFF as your viewer may not show all the pages of the TIFF or it may not be apparent how to advance to subsequent pages in the TIFF.
Consider the following 3 commands:
gs \
-o multipage-tiffg4.tif \
-sDEVICE=tiffg4 \
multipage-input.pdf
and
gs \
-o multipage-tiff24nc.tif \
-sDEVICE=tiff24nc \
multipage-input.pdf
and
gs \
-o multipage-tiff32nc.tif \
-sDEVICE=tiff32nc \
multipage-input.pdf
Each of these commands generates a multi-page TIFF, but with different output devices:
The tiffg4 one is grayscale and uses a resolution of 204dpi by 196dpi (as the TIFF G4 fax standard requires).
The tiff24nc one is in RGB color (with 8 bits per color component) and uses a resolution of 72dpi by 72dpi.
The tiff32nc one is in CMYK color (with 8 bits per color) also using a resolution of 72dpi.
There are a number of additional output devices for TIFF at the link above.
All the resolution values for the TIFF files result from Ghostscript's default settings. If you want to override these, for example because you require 600dpi by 600dpi, just add
-r600x600
to any of the above commandlines.
To prove that you have multipage TIFFs, use the following command:
identify multipage-tiff*.tif
(identify is a command from the ImageMagick package, which you seem to have installed anyway.) As a result, you should see multiple lines for each of the *.tif files -- with each line representing 1 page of the respective *.tif.
I suspect it may have worked all the way for you -- that you were just unable to recognize the multiple pages in your resulting TIFFs. Not all TIFF viewers are able to display these -- they display the first page only if they can't otherwise, which may have fooled you.

You can also use Imagick php extension if you are interested in doing it in php instead of bash script.

Related

Ghostscript to compress a batch of PDFs

I have no experience of programming.
My PDFs won't display images on the iPad in PDFExpert or GoodNotes as the images are in JPEG2000, from what I could find on the internet.
These are large PDFs, upto 1500-2000 pages with images. One of these was an 80MB or so file. I tried printing it with Foxit to convert the images to JPG from JPEG2000 but the file size jumped to 800MB...plus it's taking too long.
I stumbled upon Ghostscript, but I have NO clue how to use the command line interface.
I am very short on time. Pretty much need a step by step guide for a small script that converts all my PDFs in one go.
Very sorry about my inexperience and helplessness. Can someone spoon-feed me the steps for this?
EDIT: I want to switch the JPEG2000 to any other format that produces less of an increase in file size and causes a minimal loss in quality (within reason). I have no clue how to use Ghostscript. I basically want to change the compression on the images to something that will display correctly on the iPad while maintaining the quality of the rest of the text, as well as the embedded bookmarks.
I'll repeat that I have NO experience with command line...I don't even know how to point GS to the folder my PDFs are in...
You haven't really said what it is you want. 'Convert' PDFs how exactly ?
Note that switching from JPX (JPEG2000) to JPEG will result in a quality loss, because the image data will be quantised (with a different quantisation scheme to JPX) by the JPEG encoder. You can use a lossless compression scheme instead, but then you won't get the same kind of compression. You won't get the same compression ratio as JPX anyway no matter what you use, the result will be larger.
A simple Ghostscript command would be:
gs -sDEVICE=pdfwrite -o out.pdf in.pdf
Because JPEG2000 encoding is (or at least, was) patent encumbered, the pdfwrite device doesn't write images as JPX< by default it will write them several times with different compression schemes, and then use the one that gives the best compression (practically always JPEG).
Getting better results will require more a complex command line, but you'll also have to be more explicit about what exactly you want to achieve, and what the perceived problem with the simplistic command line is.
[EDIT]
Well, giving help on executing a command line is a bit off-topic for Stack Overflow, this is supposed to be a site for software developers :-)
Without knowing what operating system you are using its hard to give you detailed instructions, I also have no idea what an iPad uses, I don't generally use Apple devices and my only experience is with Macs.
Presumably you know where (the directory) you installed Ghostscript. Either open a command shell there and type the command ./gs or execute the command by giving the full path, such as :
/usr/bin/gs
I thought the arguments on the command line were self-explanatory, but....
The -sDEVICE=pdfwrite switch tells Ghostscript to use the pdfwrite device, as you might guess from the name, that device writes PDF files as its output.
The -o switch is the name (and full path if required) of the output file.
The final argument is the name (and again, full path if its not in the current directory) of the input file.
So a command might look like:
/usr/bin/gs -sDEVICE=pdfwrite -o /home/me/output.pdf /home/me/input.pdf
Or if Ghostscript and the input file are in the same directory:
./gs -sDEVICE=pdfwrite -o out.pdf input.pdf

Preflight program for PDFs using PoDoFo or anything else open source? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have to automate a preflight check on PDF documents. The preflight consists of:
Detect the resolution of images in an existing document and change them to 300dpi if they are not already at that resolution.
Detect the colorspace of images and if not in CMYK, then convert them to CMYK using color profiles.
Detect whether or not fonts are embedded in an existing PDF document, and correct this problem by substituting fonts. (or drawing font outlines — I'm not sure about this part).
Just wondering if this can be done using PoDoFo or any other open source projects out there. Or if I really need to go order some propriety software between $2K to $6K. My hosting environment is on Linux and supports PHP, Perl, Python, Ruby, Java.
Any ideas?
I'm not aware of any ready-made Open Source software which meets your requirements.
Only a part of it could be solved by writing your own shell script (or other program).
Detect resolution of images.
Run pdfimages -list some.pdf to output a list of images contained in the PDF as well as their dimensions... seemingly. But what is not obvious about it: these dimensions are the ones of the raw image (as embedded in the PDF). This could be 720x720 pixels. However, if rendered onto a 10x10 inch square of the page this image will be 72 DPI on the page. If rendered on a 1x1 inch square, it will be 720 DPI. Both types of 'rendering' inside a PDF can be made from the same embedded raw image, and it is the context of the current 'graphic state' which determines which is applied. So to determine the actual DPI of an image as it appears on the page requires some additional PDF parsing...
In any case, you can tell Ghostscript to re-sample images to 300 dpi, and to use a 'threshold' for this. (Ghostscript will never "upsample" an image, only downsample these which do overshoot the threshold. Upsampling almost never makes sense -- it only blows up the file size with no return in terms of higher quality.)
Convert colors to colorspace CMYK using ICC profiles.
The most recent versions of Ghostscript can do that. See also the most recent Ghostscript documentation describing its support for ICC.
Embed un-embedded fonts.
Running (and evaluating the results of) pdffonts some.pdf will show you which fonts are not embedded.
Ghostscript can embed un-embedded fonts.
So one Ghostscript command that would cover most of your requirements is this:
gs \
-o cmyk.pdf \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=CMYK \
-sProcessColorModel=DeviceCMYK \
-sOutputICCProfile=/path/to/your.icc \
-sColorImageDownsampleThreshold=2 \
-sColorImageDownsampleType=Bicubic \
-sColorImageResolution=300 \
-sGrayImageDownsampleThreshold=2 \
-sGrayImageDownsampleType=Bicubic \
-sGrayImageResolution=300 \
-sMonoImageDownsampleThreshold=2 \
-sMonoImageDownsampleType=Bicubic \
-sMonoImageResolution=1200 \
-dSubsetFonts=true \
-dEmbedAllFonts=true \
-sCannotEmbedFontPolicy=Error \
-c ".setpdfwrite<</NeverEmbed[ ]>> setdistillerparams" \
-f some.pdf
This command would downsample all images with a resolution that's higher than the double wanted resolution (*ImageDownSampleThreshold=2). Also it would apply all these settings to any input file (unless some special PDF preflighting software which would apply selective 'fixups' based on the results of 'checks' for special properties).
Lastly, I cannot see what made think you'd have to spend $2k to $6k in case you'd have to resort to closed-source, commercial preflighting software. (My favorite in this field is the very powerful callas pdfToolbox6 (which even has a version that runs as CLI on Linux) -- its basic version costs 500 €.)
My background is in printing, so please keep this in mind when reading my answer. The items you propose to do seem somewhat straight forward, but when you get into the nitty gritty of it, there's a lot of print-industry knowledge that goes into these operations.
Here's some quick feedback to your bullet points:
You won't want to upsample an low res image to 300 dpi as it will decrease image quality (via re-interpolation) and increase files size.
You need to be careful with color conversions. There may be certain builds of RGB which you'd want to convert to black only. Or what happens if someone supplies a file which is already cmyk and tagged with the incorrect profile.
Font detection - very complicated to substitute fonts. If you don't have the exact same font as the originator, you could end up with text reflow problems. To own that font, you'll have to paid for a license. You also can't convert fonts to outlines without them being embedded.
My recommendation is to look at a commercial package for preflighting. These developers have invested years into developing their programs and are experts within the field of printing. The challenging part will be finding ones that are unix based in your price range. Most are designed for Windows or Mac. Callas has a linux cl version but not at the price listed. You'd need the server version.
What type of volume are you planning to run through it?
Did you try Enfocus PitStop Pro? Contact their support department with your specific request. They have tons of PDF preflight examples and will be happy to help you out.

Compressing JPG page to PDF with various compressions/settings [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I would like to take a single page jpg, and experiment with various pdf compression settings and other settings (to analyse resultant pdf size and quality) - can anyone point me towards decent tools to do this, and any useful docs/guides?
Adobe Acrobat Distiller, Ghostscript, possibly others. Acrobat has its own manual, Ghostscript documentation can be found at:
http://svn.ghostscript.com/ghostscript/tags/ghostscript-9.02/doc/Ps2pdf.htm
for the current version PostScript to PDF conversion (print your JPEG to a PostScript file before starting).
If your original is a JPEG, then the best quality output will be to do nothing at all to it, simply wrap it up in PDF syntax.
If you insist on downsampling the image data (which is the only way you are going to reduce the size of a JPEG image) then you would be advised not to use DCT (JPEG) compression in the PDF file, as this will introduce objectionable artefacts.
You should use a lossless compression scheme instead, eg *"Flate".
Your best bet would be to go back to an original which has not been stored as a JPEG, downsample in a decent image application and then convert to JPEG and wrap that up in a PDF.
Docotic.Pdf library can be used to add JPEGs (with or without recompression) to PDF.
Please take a look at sample that shows how to recompress images before adding them to PDF. With help of the library you can recompress existing images too.
Disclaimer: I work for Bit Miracle, vendor of the library.
If you're OK working with .NET on Windows, my company, Atalasoft, has tools for creating image PDFs. You can tweak the compression very easily using code like this:
public void WriteJpegPdf(AtalaImage image, Stream outStream)
{
PdfEncoder encoder = new PdfEncoder();
encoder.JpegQuality = 60; // 0 - 100
encoder.Save(outStream, image, PdfCompressionType.Jpeg);
}
This is the simplest way of hitting the jpeg quality. It will override your setting if the image isn't 24 bit rgb or 8 bit gray.
If you are concerned with encoding a bunch of files but want fine-grained control over compression, the encoder has an event, SetEncoderCompression, that is invoked before the image is encoded to let you see what the encoder chose and you can override it if you like.
FWIW, I wrote most of the PDF Encoder and the underlying layer that exports the actual PDF.

Tools for JPEG optimization? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Do you know of any tools (preferrably command-line) to automatically and losslessly optimize JPEGs that I could integrate into our build environment? For PNGs I'm currently using PNGOUT, and it generally saves around 40% bandwidth/image size.
At the very least, I would like a tool that can strip metadata from the JPGs - I noticed a strange case where I tried to make thumbnail from a photograph, and couldn't get it smaller than 34 kB. After investigating more, I found that the EXIF data was still part of the image, and the thumbnail was 3 kB after removing the metadata.
And beyond that - is it possible to further optimize JPGs losslessly? The PNG optimizer tries different compression strategies, random initialization of the Huffmann encoding etc.
I am aware that most savings come from the JPEG quality parameter, and that it's a rather subjective measure. I'm only looking for a tool that can be run as a build step and that losslessly squeezes a few bytes from the images.
I wrote a GUI for all image optimization tools I could find, including MozJPEG and jpegoptim that optimize Huffman tables, progressive scans, and (optionally) remove invisible metadata.
If you don't have a Mac, I also have a basic web interface that works on any platform.
I use libjpeg for lossless operations. It contains a command-line tool jpegtran that can do all you want. With the commandline option -copy none all the metadata is stripped, and -optimize does a lossless optimization of the Huffmann compression. You can also convert the images to progressive mode with -progressive, but that might cause compatibility problems (does anyone know more about that?)
[WINDOWS ONLY]
RIOT(Radical Image Optimization Tool)
This is the greatest image optimization tool I have found!
http://luci.criosweb.ro/riot/
You can easily get a 10MB image down to 800KB through sub-sampling.
It supports PNG, GIF, and JPEG.
It even integrates into context menus so you can send pictures straight there.
Allows you to rotate, re-size, compress to specified KB's, and more. Also has plugins for GIMP and IrfanView and other things.
There is also a DLL available if you want to incorporate it into your own programs or java script / c++ program.
Another alternative is http://pnggauntlet.com/ PNGGAUNTLET takes forever but it does a pretty good job.
[WINDOWS ONLY]
A new service called JPEGmini produces incredible results. A shame that it's online only. Edit: It's available for Windows and Mac now
Tried a number of the suggestions above - I personally was after lossless compression.
My sample image had an original size of 67,737 bytes.
Using kraken.io, it went down to 64,718
Using jpegtran, it went down to 64,718
Using yahoo smush-it, it went down to 61,746
Using imagemagick (-strip), it went down to 65,312
The smush.py option looks promising, but the installation was too complex for me to do quickly
jpegrescan looks promising too, but seems to be unix and I'm using windows
jpegmini is NOT lossless, but I can't tell the difference (down to 22,172)
plinth's Altrasoft jpegstripper app does not work on my windows 7
jpegoptim is not windows - no good for me
Riot (keeping quality at 100%) got it down to 63,416 and with chroma subsampling set to high, it got it down to 61,912 - I don't know if that is lossless or not though, and I think it looks lighter than the original.
So my verdict is yahoo smushit if it must be lossless
I would try Imagemagick. It has tons of command line options, its free and have a nice license.
http://www.imagemagick.org
There seems to be an option called Strip that may help you:
http://www.imagemagick.org/script/command-line-options.php#strip
ImageOptim is really slick. The command line option posted by the author will populate the GUI and show progress. I used jpegtran for optimizing and converting to progressive, then ImageOptim for further progressive optimizations and for other file types.
Reuse of script code also found in this forum (all files replaced in place):
jpegtran
for file in $(find $DIR -type f \( -name "*.jpg" -or -name "*.jpeg" -or -name "*.JPG" \)); do
echo found $file for optimizing...
jpegtran -copy comments -optimize -progressive -outfile $file $file
done
ImageOptim
for file in $(find $DIR -type f \( -name "*.jpg" -or -name "*.png" -or -name "*.gif" \)); do
do
echo found $file for optimizing...
open -a ImageOptim.app $file
done
In case anyone's looking, I've written an offline version of Yahoo's Smush.it. It will losslessly optimise pngs, jpgs and gifs (animated and static):
http://github.com/thebeansgroup/smush.py
You can use jpegoptim which will losslessly optimize jpeg files by default. The --strip-all option strips all extra embedded info. You can also specify a lossy mode with the --max switch which is useful when you have images saved with a very high quality setting, which is not necessary for eg. web content.
You get similar optimization as with jpegtran (see answer by OutOfMemory) but jpegoptim can't save to progressive jpegs.
I've written a command line tool called 'picopt' (similar to ImageOptim) that uses external programs to optimize JPEGs, PNGs, GIFS, animated GIFS and even comic book archive contents (CBR/CBZ).
This is suitable for use with homebrew on OS X or Linux systems where you have installed tools like jpegrescan, jpegtran, optipng, gifsicle, etc.
https://github.com/ajslater/picopt
I too would recommend ImageMagick. It has a command line option to remove EXIF metadata
mogrify -strip image.jpg
There are plenty of other tools out there that do the same thing.
As far as recompressing JPEGs go, don't. JPEGs are lossy to start with, so any form of recompression is only going to hurt image quality. However, if you have losslessly encoded images, some encoders do a better job than others. I have noticed that JPEGs done with Photoshop consistently look better than when encoded with ImageMagick (despite the same file size) due to complicated reasons. Furthermore (and this is relevant to you), I know that at least Photoshop can save JPEGs as optimized which means they drop compatibility with some stuff that you probably don't care about to save a couple of KB. Also, make sure you don't have any colour profiles embedded and you may be able to save another couple of KB.
I would recommend using http://kraken.io It's ultra-fast webapp which will optimize your PNG and JPEG files far better than smush.it does.
I recommend to use JpegOptim, it's free and really nice, you can specify the quality, the size you want ... And easy to use in command line.
JpegOptim
May I recommend this for near-transparency:
convert 'yourfile.png' ppm:- | jpeg-recompress -t 97 -q veryhigh -a -m smallfry -s -r -S disable - yourfile.jpg
It uses imagemagick's convert and jpeg-recompress from jpeg-archive.
Both are open-source and work on Windows, Mac and Linux. You may want to tweak the options above for different quality expectations.

Latex using eps images builds slowly [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Working with a latex document with eps images as in the example below...
\documentclass[11pt]{paper}
\usepackage[dvips]{graphicx}
\usepackage{fullpage}
\usepackage{hyperref}
\usepackage{amsmath}
\DeclareMathSizes{8}{8}{8}{8}
\author{Matt Miller}
\title{my paper}
\begin{document}
\begin{figure}[!ht]
\begin{center}
\includegraphics[width=2in] {Figuer.eps}
\end{center}
\caption{Figure\label{fig:myFig}}
\end{figure}
\end{document}
When I got to build my latex document the time it takes to build the document increases with time. Are there any tips or tricks to help speed up this process?
latex paper.tex; dvipdf paper.dvi
Some additional ideas:
try making a simpler figure (e.g. if it's a bitmapped figure, make a lower-resolution one or one with a low-resolution preview)
use pdflatex and have the figure be a .jpg, .png, or .pdf source.
I generally take the latter approach (pdflatex).
How big are the eps files? Latex only needs to know the size of the bounding box, which is at the beginning of the file.
dvips (not dvipdf) shouldn't take too much time since it just needs to embed the eps into the postscript file.
dvipdf, on the other hand has to convert the eps into pdf, which is expensive.
Indeed, you can use directly
pdflatex paper.tex
Few changes are required.
Convert your graphics from EPS to PDF before running pdflatex. You need to do it only once:
epstopdf Figuer.eps
If will produce Figuer.pdf which is suitable for pdflatex. In your example dvipdf does it on every build.
In the document use
\usepackage[pdftex]{graphicx} % not [dvips]
And to include graphics, omit the extension:
\includegraphics[width=2in] {Figuer} % but {Figuer.pdf} works too
It will choose Figuer.pdf when compiled by pdflatex and Figuer.eps when compiled by latex. So the document remains compatible with legacy latex (only remember to chage \usepackage{graphics}).
Reducing the file size of your EPS files might help. Here are some ideas how to do that.
If you have the original image as JPEG, PNG, TIFF, GIF (or any other sampled image), reduce the original file size with whatever tools you have, then convert to EPS using sam2p. sam2p gives you much smaller EPS file sizes than what you get from most popular converters.
If your EPS is vector graphics, convert it to PDF using ps2pdf14 (part of Ghostscript), then convert back to eps using pdftops -eps (part of xpdf). This may reduce the EPS file size a lot.
As a quick fix, try passing the [draft] option to the graphix package.
Are you using a DVI previewer or going straight to pdf?
If you go all the way to pdf, you'll pay the cost of unencoding and reencoding (I used to have that problem with visio diagrams). However, if you can generate PSs most of the time or work straight with the DVI, the experience would be manageable.
Also, some packages will create .pdf files for you from figures, which you can then embed (I do that on my mac)