ImageMagick is converting only the first page of the pdf - pdf

I am having some trouble with ImageMagick.
I have installed GhostScript v9.00 and ImageMagick-6.6.7-1-Q16 on Windows 7 - 32Bit
When I run the following command in cmd
convert D:\test\sample.pdf D:\test\pages\page.jpg
only the first page of the pdf is converted to pdf. I have also tried the following command
convert D:\test\sample.pdf D:\test\pages\page-%d.jpg
This creates the first jpg as page-0.jpg but the other are not created.
I would really appreciated if someone can shed some light on this. Thanks.
UPDATE:
I have ran the command using -debug "All"
one of the many lines out put says:
2011-01-26T22:41:49+01:00 0:00.727 0.109u 6.6.7 Configure Magick[5800]: nt-base.c/NTGhostscriptGetString/1008/Configure
registry: "HKEY_CURRENT_USER\SOFTWARE\GPL Ghostscript\9.00\GS_DLL" (failed)
Could it maybe have something to do with GhostScript after all?

You can specify which page to convert by putting a number in [] after the filename:
convert D:\test\sample.pdf[7] D:\test\pages\page-7.jpg
It should have, however, converted all pages to individual images with your command.

By the way if you need to convert first and second pages then provide in array comma separated values
convert D:\test\sample.pdf[0,1] D:\test\pages\page.jpg
Resulting JPEG files will be named:
for page 1: page-0.jpg
for page 2: page-1.jpg
You can also do
convert D:\test\sample.pdf[10,15,20-22,50] D:\test\pages\page.jpg
Resulting JPEG files will be named:
for page 11: page-10.jpg
for page 16: page-15.jpg
for page 21: page-20.jpg
for page 22: page-21.jpg
for page 23: page-22.jpg
for page 51: page-50.jpg
May be it will help to someone.

According to the site admin at the ImageMagick forum:
ImageMagick uses the pngalpha device when it finds an Adobe
Illustrator PDF. Many of these are a single page. Ideally, Ghostscript
would support a device that allows multiple PDF pages with
transparency but it doesn't...
Easy fix. Edit delegates.xml and change pngalpha to pnmraw.
This worked for me. I don't know if it introduces any other problems however.
See this post from their forums.

I found this solution which convert all pages in the pdf to a single jpg image:
montage input.pdf -mode Concatenate -tile 1x output.jpg
montage is included in ImageMagick.
Tested on ImageMagick 6.7.7-10 on Ubuntu 13.04.

I ran into similar problem with GhostScript. This can be solver with using %03d iterator in the output file name. Here is example:
gs -r300 -dNOPAUSE -dBATCH -sDEVICE#pngalpha -sOutputFile=output-%03d.png input.pdf
Here is the reference with detailed information: https://ghostscript.com/doc/current/Devices.htm

Related

GROFF PDFPIC converted w ImageMagick to .ms document causes "troff: sample.ms:18: division by zero" and leads images to show very right of the pdf doc

I converted my original image to pdf with ImageMagick. If viewed independently, the pdf image looks perfectly normal.
sample.ms :
.PDFPIC Figure_1.pdf
Once I try to compile my .ms document with the following command:
groff -ms sample.ms -U -T pdf > sample.pdf
I get the following error from groff:
troff: sample.ms:1: division by zero
The document does compile but it looks like this: image is way to the right of the page to the point its sometimes almost completely out of the page.
I was having the same problem and it seems like the PDFs convert generates are corrupt in some way.
I ended up using convert img.png img.tiff and then tiff2pdf img.tiff > img.pdf. Including img.pdf then worked just fine.
I used tiff2pdf just because that's what I had installed, but any other program should work too if it generates valid PDF.

Why text is missing after convert this PDF to image using ImageMagick/Ghostscript?

Text is missing after convert this pdf to image(png or jpg), but there no any error log.
Use ImageMagick:
convert -density 150 -quality 100 "d:/t/pdf/fp.pdf" -alpha Remove "d:/t/pdf/5/fp.png"
Use Ghostscript (testing with version 9.23 and 9.25):
gswin64 -dSAFER -dBATCH -dNOPAUSE -r300 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -sDEVICE=jpeg -sOutputFile=D:\t\pdf\123.jpg D:\t\pdf\fp.pdf
Anyone know what the reason and how to solve it? Thx.
PDF File for testing
image 1 image 2
There are two CIDFonts (STSong-Light and AdobeKaitiStd-Regular) used but not embedded. This means that a substitute font must be used. When run through Ghostscript this produces the following transcript:
GPL Ghostscript GIT PRERELEASE 9.26 (2018-09-13)
Copyright (C) 2018 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 2.
Page 1
Can't find CID font "AdobeKaitiStd-Regular".
Attempting to substitute CID font /Adobe-GB1 for /AdobeKaitiStd-Regular, see doc
/Use.htm#CIDFontSubstitution.
The substitute CID font "Adobe-GB1" is not provided either. attempting to use fa
llback CIDFont.See doc/Use.htm#CIDFontSubstitution.
Loading a TT font from %rom%Resource/CIDFSubst/DroidSansFallback.ttf to emulate
a CID font Adobe-GB1 ... Done.
Can't find CID font "AdobeKaitiStd-Regular".
Attempting to substitute CID font /Adobe-GB1 for /AdobeKaitiStd-Regular, see doc
/Use.htm#CIDFontSubstitution.
Can't find CID font "AdobeKaitiStd-Regular".
Attempting to substitute CID font /Adobe-GB1 for /AdobeKaitiStd-Regular, see doc
/Use.htm#CIDFontSubstitution.
Loading NimbusSans-Regular font from %rom%Resource/Font/NimbusSans-Regular... 71
35536 5791889 4867288 3488798 3 done.
Can't find CID font "STSong-Light".
Attempting to substitute CID font /Adobe-GB1 for /STSong-Light, see doc/Use.htm#
CIDFontSubstitution.
Loading NimbusMonoPS-Regular font from %rom%Resource/Font/NimbusMonoPS-Regular..
. 10713600 9353422 4987912 3610458 3 done.
**** Error: Executing Do inside a text block, attempting to recover
Output may be incorrect.
>>showpage, press <return> to continue<<
So you can see two fonts being substituted, and then a more concrete problem. Your PDF file executes an image operator inside a text block, which is illegal. However for me the output is apparently correct.
[EDIT]
There is some odd behaviour here. I downloaded the 64-bit release code last night and tried that, and I do see the error. The back channel transcript contains this :
Can't find CID font "AdobeKaitiStd-Regular".
Attempting to substitute CID font /Adobe-GB1 for /AdobeKaitiStd-Regular, see doc
/Use.htm#CIDFontSubstitution.
Loading NimbusSans-Regular font from %rom%Resource/Font/NimbusSans-Regular... 77
20460 6369217 2670672 1276767 3 done.
**** Error: can't process embedded font stream,
attempting to load the font using its name.
Output may be incorrect.
**** Error reading a content stream. The page may be incomplete.
Output may be incorrect.
Loading NimbusMonoPS-Regular font from %rom%Resource/Font/NimbusMonoPS-Regular..
. 11808228 10439970 2690872 1310356 3 done.
**** Error: Executing Do inside a text block, attempting to recover
Output may be incorrect.
**** Error: File did not complete the page properly and may be damaged.
Output may be incorrect.
Page 2
The key part is "Can't process embedded font stream....' That's why your text is going missing.
When I run the same command line using the current HEAD of our Git repository I don't see this error, and the file runs to completion. So it looks like this was a bug which has been fixed.
You have two options;
1) If you don't mind building the code, clone our Git repository, open the Visual Studio solution file, allow VS to update it to your version, then build Ghostscript. Use that binary.
2) As you've already discovered, don't use SAFER. I should caution you that this is a potentially dangerous setup. As long as you are processing files which you created yourself you should be fine, but please don't use this setup to process random files from untrusted sources, you could be laying yourself open to attack.
[edit 2]
And here's a third option. With 9.25 we started shipping the Resource files with Windows, just as we do with Linux. I suspect that if you add -I"c:/program files/gs/gs9.25/Resource/Init" to the beginning of your command line, it will work even when -dSAFER is true.
BTW its useful to quote the messages from the back channel when you have a problem, it may not tell you much, but it has useful information for PostScript developers.
The missing text came back when I removed the parameter -dSAFER. I don't understand why; I can't find the reason in the Ghostscript documentation.
This is my final command line:
gswin64 -dBATCH -dNOPAUSE -r150 -sDEVICE=jpeg -sOutputFile=D:\t\pdf\6\fp%03d.jpg D:\t\pdf\fp.pdf

Inkscape "PDF + Latex" export

I'm using inkscape to produce vector figures, save them in SVG format to export them later as "PDF + Latex" much in the vein of TUG inkscape+pdflatex guide.
Trying to produce a simple figure, however, turns out to be extremely frustating.
The first figure
is an example of the figure I would like to export in the form of "PDF + Latex" (shown here in PNG format).
If I export this to a PDF figure without latex macros the PDF produced looks exactly the same, except for some minor differences with the fonts used to render the text.
When I try to export this using the "PDF + Latex" option the PDF file produced consists on a PDF document of 2 pages (again as .png here):
This, of course, does not looks good when compiling my latex document. So far the guide at TUG has been very helpful, but I still can't produce a working "PDF + Latex" export from inkscape.
What am I doing wrong?
I worked around this by putting all the text in my drawing at the top
select text and then Object -> Raise to top
Inkscape only generates the separate pages if the text is below another object.
I asked this question on the Inkscape online discussion page and got some very helpful guidance from one of the users there.
This is a known bug https://bugs.launchpad.net/ubuntu/+bug/1417470 which was inadvertently introduced in Inkscape 0.91 in an attempt to fix a previous bug https://bugs.launchpad.net/inkscape/+bug/771957.
It seems this bug does two things:
The *.pdf_tex file will have an extra \includegraphics statement which needs to be deleted manually as described in the link to the bug above.
The *.pdf file may be split into multiple pages, regardless of the size of the image. In my case the line objects were split off onto their own page. I worked around this by turning off the text objects (opacity to zero) and then doing a standard PDF export.
If you can execute linux commands, this works:
# Generate the .pdf and .pdf_tex files
inkscape -z -D --file="$SVGFILE" --export-pdf="$PDFFILE" --export-latex
# Fix the number of pages
sed -i 's/\\\\/\n/g' ${PDFFILE}_tex;
MAXPAGE=$(pdfinfo $PDFFILE | grep -oP "(?<=Pages:)\s*[0-9]+" | tr -d " ");
sed -i "/page=$(($MAXPAGE+1))/,\${/page=/d}" ${PDFFILE}_tex;
with:
$SVGFILE: path of the svg
$PDF_FILE: path of the pdf
It is possible to include these commands in a script and execute it automatically when compiling your tex file (so that you don't have to manually export from inkscape each time you modify your svg).
Try it with an illustration that is less wide.
Alternatively, use a wider paperwidth setting.

Ghostscript expand trimbox (add bleed) when converting to jpeg

I have the following command line setup to create a jpeg copy of each page of a pdf, cropping it down to the trim size. Actually i'm wanting the bleed size which is 3mm added to each edge of the trim box
-o newbitmap_%04d.jpg -sDEVICE=jpegcmyk -dJPEGQ=60 -r150 -dSimulateOverprint=false -dUseTrimBox original.pdf
Please could someone suggest how I can do this?
Thanks
David
In the current Ghostscript code, use -dUseBleedBox. See :
http://bugs.ghostscript.com/show_bug.cgi?id=694977

convert pdf to svg

I want to convert PDF to SVG please suggest some libraries/executable that will be able to do this efficiently. I have written my own java program using the apache PDFBox and Batik libraries -
PDDocument document = PDDocument.load( pdfFile );
DOMImplementation domImpl =
GenericDOMImplementation.getDOMImplementation();
// Create an instance of org.w3c.dom.Document.
String svgNS = "http://www.w3.org/2000/svg";
Document svgDocument = domImpl.createDocument(svgNS, "svg", null);
SVGGeneratorContext ctx = SVGGeneratorContext.createDefault(svgDocument);
ctx.setEmbeddedFontsOn(true);
// Ask the test to render into the SVG Graphics2D implementation.
for(int i = 0 ; i < document.getNumberOfPages() ; i++){
String svgFName = svgDir+"page"+i+".svg";
(new File(svgFName)).createNewFile();
// Create an instance of the SVG Generator.
SVGGraphics2D svgGenerator = new SVGGraphics2D(ctx,false);
Printable page = document.getPrintable(i);
page.print(svgGenerator, document.getPageFormat(i), i);
svgGenerator.stream(svgFName);
}
This solution works great but the size of the resulting svg files in huge.(many times greater than the pdf). I have figured out where the problem is by looking at the svg in a text editor. it encloses every character in the original document in its own block even if the font properties of the characters is the same. For example the word hello will appear as 6 different text blocks. Is there a way to fix the above code? or please suggest another solution that will work more efficiently.
Inkscape can also be used to convert PDF to SVG. It's actually remarkably good at this, and although the code that it generates is a bit bloated, at the very least, it doesn't seem to have the particular issue that you are encountering in your program. I think it would be challenging to integrate it directly into Java, but inkscape provides a convenient command-line interface to this functionality, so probably the easiest way to access it would be via a system call.
To use Inkscape's command-line interface to convert a PDF to an SVG, use:
inkscape -l out.svg in.pdf
Which you can then probably call using:
Runtime.getRuntime().exec("inkscape -l out.svg in.pdf")
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Runtime.html#exec%28java.lang.String%29
I think exec() is synchronous and only returns after the process completes (although I'm not 100% sure on that), so you shoudl be able to just read "out.svg" after that. In any case, Googling "java system call" will yield more info on how to do that part correctly.
Take a look at pdf2svg (also on on github):
To use
pdf2svg <input.pdf> <output.svg> [<pdf page no. or "all" >]
When using all give a filename with %d in it (which will be replaced by the page number).
pdf2svg input.pdf output_page%d.svg all
And for some troubleshooting see:
http://www.calcmaster.net/personal_projects/pdf2svg/
pdftocairo can be used to convert pdf to svg. pdfcairo is part of poppler-utils.
For example to convert 2nd page of a pdf, following command can be run.
pdftocairo -svg -f 1 -l 1 input.pdf
pdftk 82page.pdf burst
sh to-svg.sh
contents of to-svg.sh
#!/bin/bash
FILES=burst/*
for f in $FILES
do
inkscape -l "$f.svg" "$f"
done
I have encountered issues with the suggested inkscape, pdf2svg, pdftocairo, as well as the not suggested convert and mutool when trying to convert large and complex PDFs such as some of the topographical maps from the USGS. Sometimes they would crash, other times they would produce massively inflated files. The only PDF to SVG conversion tool that was able to handle all of them correctly for my use case was dvisvgm. Using it is very simple:
dvisvgm --pdf --output=file.svg file.pdf
It has various extra options for handling how elements are converted, as well as for optimization. Its resulting files can further be compacted by svgcleaner if necessary without perceptual quality loss.
inkscape (#jbeard4) for me produced svgs with no text in them at all, but I was able to make it work by going to postscript as an intermediary using ghostscript.
for page in $(seq 1 `pdfinfo $1.pdf | awk '/^Pages:/ {print $2}'`)
do
pdf2ps -dFirstPage=$page -dLastPage=$page -dNoOutputFonts $1.pdf $1_$page.ps
inkscape -z -l $1_$page.svg $1_$page.ps
rm $1_$page.ps
done
However this is a bit cumbersome, and the winner for ease of use has to go to pdf2svg (#Koen.) since it has that all flag so you don't need to loop.
However, pdf2svg isn't available on CentOS 8, and to install it you need to do the following:
git clone https://github.com/dawbarton/pdf2svg.git && cd pdf2svg
#if you dont have development stuff specific to this project
sudo dnf config-manager --set-enabled powertools
sudo dnf install cairo-devel poppler-glib-devel
#git repo isn't quite ready to ./configure
touch README
autoreconf -f -i
./configure && make && sudo make install
It produces svgs that actually look nicer than the ghostscript-inkscape one above, the font seems to raster better.
pdf2svg $1.pdf $1_%d.svg all
But that installation is a bit much, too much even if you don't have sudo. On top of that, pdf2svg doesn't support stdin/stdout, so the readily available pdftocairo (#SuperNova) worked a treat in these regards, and here's an example of "advanced" use below:
for page in $(seq 1 `pdfinfo $1.pdf | awk '/^Pages:/ {print $2}'`)
do
pdftocairo -svg -f $page -l $page $1.pdf - | gzip -9 >$1_$page.svg.gz
done
Which produces files of the same quality and size (before compression) as pdf2svg, although not binary-identical (and even visually, jumping between output of the two some pixels of letters shift, but neither looks wrong/bad like inkscape did).
Inkscape does not work with the -l option any more. It said "Can't open file: /out.svg (doesn't exist)". The long form that option is in the man page as --export-plain-svg and works but shows a deprecation warning. I was able to fix and update the command by using the -o option on Inkscape 1.1.2-3ubuntu4:
inkscape in.pdf -o out.svg