Combining SVG images programmatically - scripting

I have a program that generates multiple SVG files in batch, which I then need to be able to combine (tiled) into one file, with a set whitespace and set width in cm (or mm).
I need either an existing script or a pointer to which libraries and languages I can use to accomplish this. Any suggestions where to start?

Here are some tools which can help you to create a SVG sprite sheet from your svg files:
SVG STACK
SVG UTILS
And then you can clean up your svg when all done with a tool like
SCOUR

Yes as #victor-henriquez noted you can use montage but it’s a bit tricky, I got into it by activating the -verbose output and see that it created an inkscape command and analysing that solved this issue for me.
montage -version
# Version: ImageMagick 7.0.7-31 Q16 x86_64 20180506
I wanted …
… to label desktop icons: use -label and -pointsize (tricky to get font size correct via pointsize but depending on density)
… to increase -density (it’s tricky to find a suitable number for the output)
… to stack and tile them orderly: use -tile 15x30 (here 15 columns x 30 rows)
… to add a margin on each sub-image: use -geometry '+40+0' (adds 40px horizontally but 0px vertically)
The resulting command was (add -verbose to get detailed processing information):
montage -label '%f' -pointsize 2 -density 300 *.svg \
-tile 15x30 \
-geometry '+40+0' \
./papirus-icons-mimetypes.png
If you specify additionally the desired output pixel size geometry, eg. 96 pixels by 96 pixels -geometry '96x96+40+0', it becomes even more complex to understand what -density plays a role at. I failed to figure it out deeply ;-)

I used Victor gem https://github.com/DannyBen/victor
first_svg = File.open("first.svg").read
second_svg = File.open("second.svg").read
first_content = first_svg.split("\n")[1..-2].join(", ")
second_content = second_svg.split("\n")[1..-2].join(", ")
svg = Victor::SVG.new width: "100%", height: "100%"
svg << first_content
svg << second_content
svg.save 'final.svg'

You can have a look to montage, from ImageMagick: http://www.imagemagick.org/Usage/montage/
You can build your script around it.

Related

How to avoid gray outline artefacts when converting an eps image to pdf?

To generate vector graphics figures with LaTeX labels, I use gnuplot and the cairolatex terminal, creating the image via plot "data.txt" u 1:2:3 matrix with image notitle followed by:
latex figuregen.tex
dvips -E -ofile.eps figuregen
# Correct the bounding box automatically:
epstool --copy --bbox file.eps filename.eps
## Create a pdf:
ps2pdf -dPDFSETTINGS=/prepress -dSubsetFonts=true -dEmbedAllFonts=true -dMaxSubsetPct=100 -dCompatibilityLevel=1.3 -dEPSCrop filename.eps filename.pdf
Here is a zoom on a specific region of the original eps image:
White regions actually correspond to NaN values in the data file.
Now using the pdf file converted from eps:
In the pdf version, there are now unwanted outlines around all the NaN pixels, creating an awful lot of noise in the higher portion of the image.
I want to have these images as pdf, free of artefacts, and preserve high-quality LaTeX labels. I suspect that there might be a ps2pdf option to deactivate this kind of unwanted behaviour but I just cannot find it.
I tried things such as: -dGraphicsAlphaBits=1, -dNOINTERPOLATE, -dALLOWPSTRANSPARENCY, -dNOTRANSPARENCY, -dCompatibilityLevel=1.4 or -dCompatibilityLevel=1.5, but without success.
I also tried fixing this directly in gnuplot, but without success (see e.g. below).
Would any of you know how to solve this issue?
Thank you very much for your time!
EDIT
What's even more surprising and problematic is that these artefacts also appear when printed.
Note however that they do not appear at extreme levels of zoom in evince when only a small part of the data set is plotted.
MWE:
# plot.plt
set size ratio -1
set palette defined ( 0 '#D73027', 1 '#F46D43', 2 '#FDAE61', 3 '#FEE090', 4 '#FFFFD9', 5 '#E0F3F8', 6 '#ABD9E9', 7 '#74ADD1', 8 '#4575B4' )
#set yr [300:0] ### no artefacts if zoom is higher than 1310% in evince
set yr [400:100] ### no artefacts if zoom is higher than 1780% in evince
#set yr [450:0] ### artefacts at all zoom levels if we show more data, or all of it
set term cairolatex dashed color; set output "temp.tex"
plot "data.txt" u 1:2:3 matrix with image notitle
set output #Closes the temporary output file.
!sed -e 's|/Title|%/Title|' -e 's|/Subject|%/Subject|' -e 's|/Creator|%/Creator|' -e 's|/Author|%/Author|' < temp.tex > graph.tex
and, for completeness:
% figuregen.tex
\documentclass[dvips]{article}
\pagestyle{empty}
\usepackage[dvips]{graphicx} %
\begin{document}
\input graph.tex
\end{document}
If needed, part of the data can be found in text form here; enough to reproduce the issue: https://paste.nomagic.uk/?e0343cc8f759040a#DkRxNiNrH6d3QMZ985CxhA21pG2HpVEShrfjg84uvAdt
EDIT 2
In fact, same artefact issues appear when using set terminal cairolatex pdf
set terminal cairolatex standalone pdf size 16cm,10.5cm dashed transparent
set output "plot.tex"
directly with pdflatex
gnuplot<plot.plt
pdflatex plot.tex
(Note, this is using Gnuplot Version 5.2 patchlevel 6).
The actual problem is, that NaN values are set to transparent black pixels (#00000000).
The transparency causes these gray outline artifacts, depending on the zooming level. If you zoom close enough, then you see no artifacts.
But as soon as the image pixels are smaller than your monitor pixels, the values are interpolated for screen display. Its seems that pdf viewers like evince (I tested also okular and mupdf) interpolate both color and alpha channels, so that the alpha value of the Nan pixels is changed, and the underlying black appears as gray border around the color pixels.
I tried several ways. The easiest one, which actually worked for me was to use the tikz terminal with option externalimages which saves images created with image as separate png file.
These png file also contains transparency, and the final result has the same artifacts.
But you can use imagemagick's convert to change the transparent NaN pixels of the png to white with
convert temp.001.png -alpha off -fill white -opaque black temp.001.png
So, a fully working plot file is
# plot.plt
set size ratio -1
set palette defined ( 0 '#D73027', 1 '#F46D43', 2 '#FDAE61', 3 '#FEE090', 4 '#FFFFD9', 5 '#E0F3F8', 6 '#ABD9E9', 7 '#74ADD1', 8 '#4575B4' )
set ytics 100
set yrange reverse
set term tikz standalone externalimages background "white"; set output "temp.tex"
plot "data.txt" u 1:2:3 matrix with image notitle
# temp.001.png is the external image which contains only the 'with image' part
# We must remove the #00000000 color, which represents the NaN pixels
# I couldn't replace the colors directly, but I could first remove the alpha channel
# and then change black to white, because no other black pixel appear
!convert temp.001.png -alpha off -fill white -opaque black temp.001.png
set output #Closes the temporary output file.
!sed -e 's|/Title|%/Title|' -e 's|/Subject|%/Subject|' -e 's|/Creator|%/Creator|' -e 's|/Author|%/Author|' < temp.tex > graph.tex
!pdflatex graph.tex
Mupdf screen shot for graph.pdf:
Note, that I used standalone to be able to directly compile the resulting file, so that I could check the result.
A more cumbersome alternative would be to "manually" plot with image to a png file, and include that in a second plot, like I described in Big data surface plots: Call gnuplot from tikz to generate bitmap and include automatically? Then you can have more influence on how the png is generated.
Just for the records, with image pixels seems to do the "trick" and will create a file without grey surrounding of NaN datapoints. Tested with gnuplot 5.2.6.
plot FILE u 1:2:3 matrix with image pixels notitle
Code:
### avoid shading around NaN datapoints
reset session
set size ratio -1
FILE = "data.txt"
set palette defined ( 0 '#D73027', 1 '#F46D43', 2 '#FDAE61', 3 '#FEE090', 4 '#FFFFD9', 5 '#E0F3F8', 6 '#ABD9E9', 7 '#74ADD1', 8 '#4575B4' )
set term cairolatex dashed color
set output "temp.tex"
plot FILE u 1:2:3 matrix with image pixels notitle
set output
### end of code
Result: (a PNG of a screenshot, since it looks like I cannot add a PDF here)

Add white border to PDF (change paper format)

I have to change a given PDF from A4 (210mm*297mm) to 216mm*303mm.
The additional 6 mm for each dimension should be set as white border of 3mm on each side. The original content of the PDF pages should be centered on the output pages.
I tried with convert:
convert in.pdf -bordercolor "#FFFFFF" -border 9 out.pdf
This gives me exactly the needed result but I loose very much sharpness of the original images in the PDF. It is all kind of blurry.
I also checked with
convert in.pdf out.pdf
which does no changes at all but also screws up the images.
So I tried Ghostcript but did not get any result. The best approach I found so far from a German side is:
gs -sOutputFile=out.pdf -sDEVICE=pdfwrite -g6120x8590 \
-c "<</Install{1 1 scale 8.5 8.5}>> setpagedevice" \
-dNOPAUSE -dBATCH in.pdf
but I get Error: /typecheck in --.postinstall--.
By default, Imagemagick converts input PDF files into images with 72dpi. This is awfully low resolution, as you experienced firsthad. The output of Imagemagick is always a raster image, so if your input PDF was text, it will no longer be.
If you don't mind the output PDF's getting bigger, you can simply increase the ratio Imagemagick is probing the original PDF using -density option, like this:
convert -density 600 in.pdf -bordercolor "#FFFFFF" -border 9 out.pdf
I used 600 because it is the sweet spot that works well for OCR. I recomment trying 300, 450, 600, 900 and 1200 and picking the best one that doesn't get unwieldably huge.
Shifting the content on the media is not especially hard, but it does mean altering the content stream of the PDF file, which most PDF manipulation packages avoid, with good reason.
The code you quote above really won't work, it leaves garbage on the operand stack, and the PLRM explicitly states that it is followed by an implicit initgraphics which will reset all the standard parameters anyway.
You could try instead setting a /BeginPage procedure to translate the origin, which will probably work:
<</BeginPage {8.5 8.5 translate} >> setpagedevice
Note that you aren't simply manipulating the original PDF file; Ghostscript takes the original PDF file, interprets it into graphics primitives, then reassembles those primitives into a new PDF file, this has implications... For example, if an image is DCT encoded (a JPEG) in the original, it will be decompressed before being passed into the output file. You probably don't want to reapply DCT encoding as this will introduce visible artefacts.
A simpler alternative, but involving multiple processing steps and therefore more potential for problems, is to first convert the PDF to PostScript with the ps2write device, specifying your media size, and also the -dCenterPages switch, then use the pdfwrite device to turn the resulting PostScript into a new PDF file.
Instead of
-g6120x8590 \
-c "<</Install{1 1 scale 8.5 8.5}>> setpagedevice"
(which is wrong), you should use:
-g6120x8590 \
-c "<</Install{8.5 8.5 translate}>> setpagedevice"
or
-g6120x8590 \
-c "<</Install{3 25.4 div 72 mul dup translate}>> setpagedevice"
(which lets Ghostscript calculate the "3mm == 8.5pt" itself...)

Convert multipage PDF to PNG and back (Linux)

I have a lot of PDF documents that I want to convert to PNG, edit in Gimp, and then save back to the multipage Acrobat file. I'm filling out forms and adding scanned signature, trying to avoid printing, signing, then scanning back in, with the ability to type the information I need to enter.
I've been trying to use Imagemagick to convert to png files, which seems to work fine. I use the command convert -quality 100 -density 300x300 multipage.pdf single%d.png
(I'm not really sure if the quality parameter is right for png).
But I'm having problems with saving back to PDF. Some of the files have the wrong page size, and I've tried every command and procedure I can find, but there are always a few odd sizes. The resolution seems to vary so that it looks good at a certain zoom level, but either a few pages are specified at about 2" wide, or they are 8.5x11 but the others are about 35" wide. I've tried making sure Gimp had the canvass size and resolution correct, and to save the resolution in the file, but that doesn't seem to matter.
The command I use to save the files is convert -page letter -adjoin single*.png multipage.pdf I've tried other parameters, but none seemed to matter.
If anyone has any ideas or alternatives, I'd appreciate it.
"I'm not really sure if the quality parameter is right for PNG."
For PNG output, the -quality setting is very unlike JPEG's quality setting (which simply is an integer from 0 to 100).
For PNG it is composed by two single digits:
The first digit (tens) is (largely) the zlib compression level, and it may go from 0 to 9.
(However the setting of 0 has a special meaning: when you use it you'll get Huffman compression, not zlib compression level 0. This is often better... Weird but true.)
The second digit is the PNG data encoding filter type (before it is compressed):
0 is none,
1 is "sub",
2 is "up",
3 is "average",
4 is "Paeth", and
5 is "adaptive".
In practical terms that means:
For illustrations with solid sequences of color a "none" filter (-quality 00) is typically the most appropriate.
For photos of natural landscapes an "adaptive" filtering (-quality 05) is generally the best.
"I'm having problems with saving back to PDF. Some of the files have the wrong page size, and I've tried every command and procedure I can find [...] but either a few pages are specified at about 2" wide, or they are 8.5x11 but the others are about 35" wide."
Not having available your PNG files, I created a few simple ones with different dimensions to verify the different commands (as I wasn't sure myself any more). Indeed, the one you used:
convert -page letter -adjoin single*.png multipage.pdf
does create all PDF pages in (same) letter size, but it places my sample of (differently sized) PNGs always on the lower left corner of the PDF page. (Should a PNG exceed the PDF page size, it does scale them down to make them fit -- but it doesn't scale up smaller PNGs to fill the available page space.)
The following modification to the command will place the PNGs into the center of each PDF page:
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
multipage.pdf
If this is still not good enough for you, you can enforce a (possibly non-proportional!) scaling to almost fill the letter area by adding a -scale '590!x770!' parameter (this will leave a border of 11 pt at each edge of the page):
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
-scale '590!x770!' \
multipage.pdf
To leave away the extra border, use -scale '612!x792!'. -- Should you want only upward scaling to happen if required while keeping the aspect ratio of the PNG, use -scale '590<x770<':
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
-scale '590<x770<' \
multipage.pdf
Why not just use Xournal? That's what I use to annotate PDFs

Image Magick: Image optimization for websites

I have a camera which produces photographs of 3008x2000 pixels. I use Image Magick to scale and resize the photos to be put up on my website. The size of the images I am using on the website is 602x400. I use this command to reduce the size:
convert DSC_0124.JPG -scale 20% -size 24% img1.jpg
This produces an image which is 602x400 pixels in size. But the file size will be always above 250KB. More images on a single html page means the page will be heavier and loading time will be longer. Are there any features in image magic that will help me to keep the file size as small as possible, possibly, below 100KB. But the image size should be the same, that is, 602x400px. I have achieved similar optimisation with SEAMonster tool for MS Windows. As it doean't have a commandline alternative, it wouldn't be of much help when there are hundreds of images to be converted.
Use command as Delan proposed with additional "-strip" flag to remove EXIF data, this have reduced the size of some of my images drastically. Here is a bash script for unix platforms, but you can use the second part only for individual images.
for X in *.jpg; do convert "$X" -resize 602x400 -strip -quality 86 "$X"; done
This will convert all images in the directory.
Use -quality to set the compression level:
convert DSC_0124.JPG -scale 20% -size 24% -quality [0..100] img1.jpg
You can define the maximum size of the output image at 100KB like this:
convert DSC_0124.JPG -resize 602x400! -strip -define jpeg:extent=100KB img1.jpg
If you are running your website on PHP, you might want to consider the SLIR image resizing script, it does a great job resizing to various constraints (see below) and caches the results.
Parameters:
w Maximum width
h Maximum height
c Crop ratio
q Quality
b Background fill color
p Progressive
http://shiftingpixel.com/2008/03/03/smart-image-resizer/
http://code.google.com/p/smart-lencioni-image-resizer/

Convert PDF to image with high resolution

I'm trying to use the command line program convert to take a PDF into an image (JPEG or PNG). Here is one of the PDFs that I'm trying to convert.
I want the program to trim off the excess white-space and return a high enough quality image that the superscripts can be read with ease.
This is my current best attempt. As you can see, the trimming works fine, I just need to sharpen up the resolution quite a bit. This is the command I'm using:
convert -trim 24.pdf -resize 500% -quality 100 -sharpen 0x1.0 24-11.jpg
I've tried to make the following conscious decisions:
resize it larger (has no effect on the resolution)
make the quality as high as possible
use the -sharpen (I've tried a range of values)
Any suggestions please on getting the resolution of the image in the final PNG/JPEG higher would be greatly appreciated!
It appears that the following works:
convert \
-verbose \
-density 150 \
-trim \
test.pdf \
-quality 100 \
-flatten \
-sharpen 0x1.0 \
24-18.jpg
It results in the left image. Compare this to the result of my original command (the image on the right):
(To really see and appreciate the differences between the two, right-click on each and select "Open Image in New Tab...".)
Also keep the following facts in mind:
The worse, blurry image on the right has a file size of 1.941.702 Bytes (1.85 MByte).
Its resolution is 3060x3960 pixels, using 16-bit RGB color space.
The better, sharp image on the left has a file size of 337.879 Bytes (330 kByte).
Its resolution is 758x996 pixels, using 8-bit Gray color space.
So, no need to resize; add the -density flag. The density value 150 is weird -- trying a range of values results in a worse looking image in both directions!
Personally I like this.
convert -density 300 -trim test.pdf -quality 100 test.jpg
It's a little over twice the file size, but it looks better to me.
-density 300 sets the dpi that the PDF is rendered at.
-trim removes any edge pixels that are the same color as the corner pixels.
-quality 100 sets the JPEG compression quality to the highest quality.
Things like -sharpen don't work well with text because they undo things your font rendering system did to make it more legible.
If you actually want it blown up use resize here and possibly a larger dpi value of something like targetDPI * scalingFactor That will render the PDF at the resolution/size you intend.
Descriptions of the parameters on imagemagick.org are here
I use pdftoppm on the command line to get the initial image, typically with a resolution of 300dpi, so pdftoppm -r 300, then use convert to do the trimming and PNG conversion.
I really haven't had good success with convert [update May 2020: actually: it pretty much never works for me], but I've had EXCELLENT success with pdftoppm. Here's a couple examples of producing high-quality images from a PDF:
[Produces ~25 MB-sized files per pg] Output uncompressed .tif file format at 300 DPI into a folder called "images", with files being named pg-1.tif, pg-2.tif, pg-3.tif, etc:
mkdir -p images && pdftoppm -tiff -r 300 mypdf.pdf images/pg
[Produces ~1MB-sized files per pg] Output in .jpg format at 300 DPI:
mkdir -p images && pdftoppm -jpeg -r 300 mypdf.pdf images/pg
[Produces ~2MB-sized files per pg] Output in .jpg format at highest quality (least compression) and still at 300 DPI:
mkdir -p images && pdftoppm -jpeg -jpegopt quality=100 -r 300 mypdf.pdf images/pg
For more explanations, options, and examples, see my full answer here:
https://askubuntu.com/questions/150100/extracting-embedded-images-from-a-pdf/1187844#1187844.
Related:
[How to turn a PDF into a searchable PDF w/pdf2searchablepdf] https://askubuntu.com/questions/473843/how-to-turn-a-pdf-into-a-text-searchable-pdf/1187881#1187881
Cross-linked:
How to convert a PDF into JPG with command line in Linux?
https://unix.stackexchange.com/questions/11835/pdf-to-jpg-without-quality-loss-gscan2pdf/585574#585574
normally I extract the embedded image with 'pdfimages' at the native resolution, then use ImageMagick's convert to the needed format:
$ pdfimages -list fileName.pdf
$ pdfimages fileName.pdf fileName # save in .ppm format
$ convert fileName-000.ppm fileName-000.png
this generate the best and smallest result file.
Note: For lossy JPG embedded images, you had to use -j:
$ pdfimages -j fileName.pdf fileName # save in .jpg format
With recent "poppler-util" (0.50+, 2016) you can use -all that save lossy as jpg and lossless as png, so a simple:
$ pdfimages -all fileName.pdf fileName
extract always the best possible quality content from PDF.
On little provided Win platform you had to download a recent (0.68, 2018) 'poppler-util' binary from:
http://blog.alivate.com.au/poppler-windows/
In ImageMagick, you can do "supersampling". You specify a large density and then resize down as much as desired for the final output size. For example with your image:
convert -density 600 test.pdf -background white -flatten -resize 25% test.png
Download the image to view at full resolution for comparison..
I do not recommend saving to JPG if you are expecting to do further processing.
If you want the output to be the same size as the input, then resize to the inverse of the ratio of your density to 72. For example, -density 288 and -resize 25%. 288=4*72 and 25%=1/4
The larger the density the better the resulting quality, but it will take longer to process.
I have found it both faster and more stable when batch-processing large PDFs into PNGs and JPGs to use the underlying gs (aka Ghostscript) command that convert uses.
You can see the command in the output of convert -verbose and there are a few more tweaks possible there (YMMV) that are difficult / impossible to access directly via convert.
However, it would be harder to do your trimming and sharpening using gs, so, as I said, YMMV!
It also gives you good results:
exec("convert -geometry 1600x1600 -density 200x200 -quality 100 test.pdf test_image.jpg");
Linux user here: I tried the convert command-line utility (for PDF to PNG) and I was not happy with the results. I found this to be easier, with a better result:
extract the pdf page(s) with pdftk
e.g.: pdftk file.pdf cat 3 output page3.pdf
open (import) that pdf with GIMP
important: change the import Resolution from 100 to 300 or 600 pixel/in
in GIMP export as PNG (change file extension to .png)
Edit:
Added picture, as requested in the Comments. Convert command used:
convert -density 300 -trim struct2vec.pdf -quality 100 struct2vec.png
GIMP : imported at 300 dpi (px/in); exported as PNG compression level 3.
I have not used GIMP on the command line (re: my comment, below).
For Windows (tested on W11):
magick.exe -verbose -density 150 "input.pdf" -quality 100 -sharpen 0x1.0 output.jpg
You need install:
ImageMagick https://imagemagick.org/index.php
ghostscript
https://www.ghostscript.com/releases/gsdnld.html
Additional info:
Watch for using -flatten parameter since it can produce only first page as image
Use -scene 1 parameter to start at index 1 with images names
convert command mentioned in question has been deprecated in favor to magick
One more suggestion is that you can use GIMP.
Just load the PDF file in GIMP->save as .xcf and then you can do whatever you want to the image.
I have used pdf2image. A simple python library that works like charm.
First install poppler on non linux machine. You can just download the zip. Unzip in Program Files and add bin to Machine Path.
After that you can use pdf2image in python class like this:
from pdf2image import convert_from_path, convert_from_bytes
images_from_path = convert_from_path(
inputfile,
output_folder=outputpath,
grayscale=True, fmt='jpeg')
I am not good with python but was able to make exe of it.
Later you may use the exe with file input and output parameter. I have used it in C# and things are working fine.
Image quality is good. OCR works fine.
Edited:
Here is my another finding, You don't need to install Poppler for conversion.
Just make your converter.exe from Python and place it in binary bin folder of Poppler window.
I suppose it will work on azure aswell.
PNG file you attached looks really blurred. In case if you need to use additional post-processing for each image you generated as PDF preview, you will decrease performance of your solution.
2JPEG can convert PDF file you attached to a nice sharpen JPG and crop empty margins in one call:
2jpeg.exe -src "C:\In\*.*" -dst "C:\Out" -oper Crop method:autocrop
I use icepdf an open source java pdf engine. Check the office demo.
package image2pdf;
import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.awt.image.RenderedImage;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
public class pdf2image {
public static void main(String[] args) {
Document document = new Document();
try {
document.setFile("C:\\Users\\Dell\\Desktop\\test.pdf");
} catch (PDFException ex) {
System.out.println("Error parsing PDF document " + ex);
} catch (PDFSecurityException ex) {
System.out.println("Error encryption not supported " + ex);
} catch (FileNotFoundException ex) {
System.out.println("Error file not found " + ex);
} catch (IOException ex) {
System.out.println("Error IOException " + ex);
}
// save page captures to file.
float scale = 1.0f;
float rotation = 0f;
// Paint each pages content to an image and
// write the image to file
for (int i = 0; i < document.getNumberOfPages(); i++) {
try {
BufferedImage image = (BufferedImage) document.getPageImage(
i, GraphicsRenderingHints.PRINT, Page.BOUNDARY_CROPBOX, rotation, scale);
RenderedImage rendImage = image;
try {
System.out.println(" capturing page " + i);
File file = new File("C:\\Users\\Dell\\Desktop\\test_imageCapture1_" + i + ".png");
ImageIO.write(rendImage, "png", file);
} catch (IOException e) {
e.printStackTrace();
}
image.flush();
}catch(Exception e){
e.printStackTrace();
}
}
// clean up resources
document.dispose();
}
}
I've also tried imagemagick and pdftoppm, both pdftoppm and icepdf has a high resolution than imagemagick.
Please take note before down voting, this solution is for Gimp using a graphical interface, and not for ImageMagick using a command line, but it worked perfectly fine for me as an alternative, and that is why I found it needful to share here.
Follow these simple steps to extract images in any format from PDF documents
Download GIMP Image Manipulation Program
Open the Program after installation
Open the PDF document that you want to extract Images
Select only the pages of the PDF document that you would want to extract images from.
N/B: If you need only the cover images, select only the first page.
Click open after selecting the pages that you want to extract images from
Click on File menu when GIMP when the pages open
Select Export as in the File menu
Select your preferred file type by extension (say png) below the dialog box that pops up.
Click on Export to export your image to your desired location.
You can then check your file explorer for the exported image.
That's all.
I hope this helps
get Image from Pdf in iOS Swift Best solution
func imageFromPdf(pdfUrl : URL,atIndex index : Int, closure:#escaping((UIImage)->Void)){
autoreleasepool {
// Instantiate a `CGPDFDocument` from the PDF file's URL.
guard let document = PDFDocument(url: pdfUrl) else { return }
// Get the first page of the PDF document.
guard let page = document.page(at: index) else { return }
// Fetch the page rect for the page we want to render.
let pageRect = page.bounds(for: .mediaBox)
let renderer = UIGraphicsImageRenderer(size: pageRect.size)
let img = renderer.image { ctx in
// Set and fill the background color.
UIColor.white.set()
ctx.fill(CGRect(x: 0, y: 0, width: pageRect.width, height: pageRect.height))
// Translate the context so that we only draw the `cropRect`.
ctx.cgContext.translateBy(x: -pageRect.origin.x, y: pageRect.size.height - pageRect.origin.y)
// Flip the context vertically because the Core Graphics coordinate system starts from the bottom.
ctx.cgContext.scaleBy(x: 1.0, y: -1.0)
// Draw the PDF page.
page.draw(with: .mediaBox, to: ctx.cgContext)
}
closure(img)
}
}
//Usage
let pdfUrl = URL(fileURLWithPath: "PDF URL")
self.imageFromPdf2(pdfUrl: pdfUrl, atIndex: 0) { imageIS in
}
Many answers here concentrate on using magick (or its dependency GhostScript) as set by the OP question, with a few suggesting Gimp as an alternative, without describing why some settings may work best for different cases.
Taking the OP "sample" the requirement is a crisp trimmed image as small as possible yet retaining good readability. and here the result is 96 dpi in 58 KB (a very small increase on the vector source 54 KB) yet retains a good image even zoomed in. compare that with 72 dpi (226 KB) in the accepted answer image above.
The key point is any image processor can be scripted to batch run from the command line using a profile as input, so here IrfanView (with or without GS) is set to auto crop the pdf page(s) and output at a default 96 dpi to PNG using only 4 BitPerPixel colour for 16 shades of greys.
The size could be further reduced by dropping resolution to 72 but 96 is an optimal setting for PNG screen display.
Use this commandline:
convert -geometry 3600x3600 -density 300x300 -quality 100 TEAM\ 4.pdf team4.png
This should correctly convert the file as you've asked for.
The following python script will work on any Mac (Snow Leopard and upward). It can be used on the command line with successive PDF files as arguments, or you can put in into a Run Shell Script action in Automator, and make a Service (Quick Action in Mojave).
You can set the resolution of the output image in the script.
The script and a Quick Action can be downloaded from github.
#!/usr/bin/python
# coding: utf-8
import os, sys
import Quartz as Quartz
from LaunchServices import (kUTTypeJPEG, kUTTypeTIFF, kUTTypePNG, kCFAllocatorDefault)
resolution = 300.0 #dpi
scale = resolution/72.0
cs = Quartz.CGColorSpaceCreateWithName(Quartz.kCGColorSpaceSRGB)
whiteColor = Quartz.CGColorCreate(cs, (1, 1, 1, 1))
# Options: kCGImageAlphaNoneSkipLast (no trans), kCGImageAlphaPremultipliedLast
transparency = Quartz.kCGImageAlphaNoneSkipLast
#Save image to file
def writeImage (image, url, type, options):
destination = Quartz.CGImageDestinationCreateWithURL(url, type, 1, None)
Quartz.CGImageDestinationAddImage(destination, image, options)
Quartz.CGImageDestinationFinalize(destination)
return
def getFilename(filepath):
i=0
newName = filepath
while os.path.exists(newName):
i += 1
newName = filepath + " %02d"%i
return newName
if __name__ == '__main__':
for filename in sys.argv[1:]:
pdf = Quartz.CGPDFDocumentCreateWithProvider(Quartz.CGDataProviderCreateWithFilename(filename))
numPages = Quartz.CGPDFDocumentGetNumberOfPages(pdf)
shortName = os.path.splitext(filename)[0]
prefix = os.path.splitext(os.path.basename(filename))[0]
folderName = getFilename(shortName)
try:
os.mkdir(folderName)
except:
print "Can't create directory '%s'"%(folderName)
sys.exit()
# For each page, create a file
for i in range (1, numPages+1):
page = Quartz.CGPDFDocumentGetPage(pdf, i)
if page:
#Get mediabox
mediaBox = Quartz.CGPDFPageGetBoxRect(page, Quartz.kCGPDFMediaBox)
x = Quartz.CGRectGetWidth(mediaBox)
y = Quartz.CGRectGetHeight(mediaBox)
x *= scale
y *= scale
r = Quartz.CGRectMake(0,0,x, y)
# Create a Bitmap Context, draw a white background and add the PDF
writeContext = Quartz.CGBitmapContextCreate(None, int(x), int(y), 8, 0, cs, transparency)
Quartz.CGContextSaveGState (writeContext)
Quartz.CGContextScaleCTM(writeContext, scale,scale)
Quartz.CGContextSetFillColorWithColor(writeContext, whiteColor)
Quartz.CGContextFillRect(writeContext, r)
Quartz.CGContextDrawPDFPage(writeContext, page)
Quartz.CGContextRestoreGState(writeContext)
# Convert to an "Image"
image = Quartz.CGBitmapContextCreateImage(writeContext)
# Create unique filename per page
outFile = folderName +"/" + prefix + " %03d.png"%i
url = Quartz.CFURLCreateFromFileSystemRepresentation(kCFAllocatorDefault, outFile, len(outFile), False)
# kUTTypeJPEG, kUTTypeTIFF, kUTTypePNG
type = kUTTypePNG
# See the full range of image properties on Apple's developer pages.
options = {
Quartz.kCGImagePropertyDPIHeight: resolution,
Quartz.kCGImagePropertyDPIWidth: resolution
}
writeImage (image, url, type, options)
del page
You can do it in LibreOffice Draw (which is usually preinstalled in Ubuntu):
Open PDF file in LibreOffice Draw.
Scroll to the page you need.
Make sure text/image elements are placed correctly. If not, you can adjust/edit them on the page.
Top menu: File > Export...
Select the image format you need in the bottom-right menu. I recommend PNG.
Name your file and click Save.
Options window will appear, so you can adjust resolution and size.
Click OK, and you are done.
convert -density 300 * airbnb.pdf
Looked perfect to me
It's actually pretty easy to do with Preview on a mac. All you have to do is open the file in Preview and save-as (or export) a png or jpeg but make sure that you use at least 300 dpi at the bottom of the window to get a high quality image.
this works for creating a single file from multiple PDF's and images files:
php exec('convert -density 300 -trim "/path/to/input_filename_1.png" "/path/to/input_filename_2.pdf" "/path/to/input_filename_3.png" -quality 100 "/path/to/output_filename_0.pdf"');
WHERE:
-density 300 = dpi
-trim = something about transparancy - makes edges look smooth, it seems
-quality 100 = quality vs compression (100 % quality)
-flatten ... for multi page, do not use "flatten"