Tile/concatenate high resolution PDF files with imagemagick - pdf

I've 9 high quality PDF files. I want to merge them into one large PDF of 3x3. I then want to turn this into a PNG file. I want to keep the resolution/sharpness during this process so that on the resulting PNG I can zoom right in and still see the fine detail. I thought I might do this with imagemagick but I'm struggling. Any ideas please?
I've tried this to merge them together to start with. It works, but the quality doesn't remain.
montage input_*.pdf -background none -tile 3x3 -geometry +0+0 output.pdf
Please note that file size and size of resulting image isn't an issue. I've no need to print it or anything like that. It's for viewing on a computer only.
Here is a sample of three of the PDF files:
1) https://www.dropbox.com/s/qc094jg1nkfk0jw/input_1.pdf?dl=0
2) https://www.dropbox.com/s/gb4u8r7bxg8lw2r/input_2.pdf?dl=0
3) https://www.dropbox.com/s/97dhi42wrvfxfd2/input_3.pdf?dl=0
Each PDF is 1071 x 1800 pts (using pdfinfo).
Thanks
James

Rather than stick with PDF and then merge and then convert to PNG, you may be better to extract the images as PNG in the first place and then concatenate the PNG files like this:
pdfimages -png input_1.pdf a
pdfimages -png input_2.pdf a
pdfimages -png input_3.pdf a
# Combine them side by side
montage a-*png -background none -tile 3x3 -geometry +0+0 output.png
# Or combine with "convert"
convert a-*.png +append result.png
The second document seems to have a mask...
pdfimages -list input_1.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 12000 20167 icc 3 8 image no 9 0 807 807 1260K 0.2%
pdfimages -list input_2.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 12000 20167 icc 3 8 image no 9 0 807 807 5781K 0.8%
1 1 smask 12000 20167 gray 1 8 image no 9 0 807 807 230K 0.1%
pdfimages -list input_3.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 12001 20167 icc 3 8 image no 9 0 807 807 2619K 0.4%

Related

ImageMagick converts PDF into tiny images despite setting density and resize options

I'm using ImageMagick to convert the following PDF to an PNG file: PDF from IMSLP (Permalink)
In a PDF viewer it looks nice (even though it needs quite a bit of zooming):
but when converting with
convert "file.pdf" "/tmp/file.png"
the produced image gets an extremely low resolution:
when adding density and resize information, I get somewhat bigger images, but still not the original resolution that is stored within the PDF (certainly not 300 DPI)
convert -density "300" -resize "3000x3000>" "file.pdf" "/tmp/file.png"
When using Poppler-Utils' pdfimages, I'm getting the appropriate image:
My question is: Is there any way to tell ImageMagick to extract the images in the "correct" resolution (as is stored in the PDF document)? In other words, ignore the zoom that is necessary to view the PDF properly, thus extracting the correct image resolution?
I'm using ImageMagick 7.1.0.16 with Ghostscript 9.55.0 inside an Alpine Linux docker image.
Very unusual structure you have there its been through many changes but we can guess some pages may have been converted to 300 dpi or 600 dpi since they all render at roughly the same size.
Note that graphics dpi is subjective it is not that value that's used inside a PDF it is the the pixels per default of 72 point units that relate to a graphics working dpi. the image may have been 75 dpi but stored at 300 pixels per 72 points.
1st Analysis says images are
image-0028 = 714 X 900 dots nominally 600 dpi
image-0002 = 726 X 900 dots nominally 600 dpi
image-0005 = 674 x 900 dots nominally 600 dpi
image-0008 = 674 x 900 dots nominally 600 dpi
image-0011 = 674 x 900 dots nominally 600 dpi
image-0014 = 674 x 900 dots nominally 600 dpi
but all have been down-sampled to various sizes approx. 1.2" x 1.5" so a sensible source size to match all those reductions is possibly
9.6" x 12" with some cropping.
Thus to get the nearest original quality extract pages # 600 dpi (lossless png would be best to keep those lossy jpeg flaws)
Then reconvert them to 75 dpi should give you the closest to the poor quality inputs.
You need to increase your density much larger and put your resize after reading the input in Imagemagick.
This will be 5800 × 7200 pixels:
convert -density 4800 IMSLP358086-PMLP578359-Ehr_OP_20_5.pdf[1] x.png
This will be 2417 × 3000 pixels:
convert -density 4800 IMSLP358086-PMLP578359-Ehr_OP_20_5.pdf[1] -resize "3000x3000>" y.png

Draw rectangle with Ghostscript (using PostScript language)

I'm trying to draw a rectangle and output it to a PDF using Ghostscript.
If I put the following PostScript code in a file named rect.eps, I get what I want:
newpath
100 100 moveto
0 100 rlineto
100 0 rlineto
0 -100 rlineto
-100 0 rlineto
closepath
gsave
0 0 0 setrgbcolor
fill
stroke
showpage
But if I try to include that PostScript into my Ghostscript-command, I just get a blank page:
gs -o rect.pdf -sDEVICE=pdfwrite -g300x300 -c "newpath 100 100 moveto 0 100 rlineto 100 0 rlineto 0 -100 rlineto -100 0 rlineto closepath gsave 0 0 0 setrgbcolor fill stroke showpage"
What am I doing wrong, shouldn't it be possible to draw a rectangle with Ghostscript?
Best Regards
Niclas
Stefan's comment is effectively correct.
You have set a media size in pixels of 300x300. Now given that the pdfwrite device's default resolution is 720 dpi, and you haven't changed that, this means that the media size is less than half an inch in each direction.
You have then drawn a rectangle, staring at 100,100 units on the page, and extending by 100 units in each direction. PostScritp units are 1/72 of an inch, so your rectangle's lower left corner begins at just over 1 inch up and right.
That's outside the half-inch square defined by your media, so the result is simply that the rectangle is drawn off the page.
If you don't set the media size Ghostscript will use its default, either A4 or Letter depending, and you will see the output. As to why it works when you make an EPS file, I have no idea, I expect there is content in the EPS that you haven't shared which is making a difference.
When creating a PDF file, which is a resolution-independent format, its better to specify the media size in resolution-independent units, like PostScript units, than pixels.
Note that your code has an additional problem, also mentioned by Stefan, the dangling gsave, which looks like it ought to have a grestore before the stroke. As it is the stroke will do nothing, I suspect you want:
gsave
0 0 0 setrgbcolor
fill
grestore
stroke
showpage

Tiling an image over a page with ImageMagick with print margins?

I am trying to get ImageMagick to do something for me and I am running into a few problems. First, I am not understanding units of measure and such passed into ImageMagick and so my script is not producing what I need. Second, the way I am doing it is extremely inefficient. Running this script takes a very long time (the one you see below is slightly trimmed down from what I am running).
So to what I am doing... I have a number of svg files with icons in them. I am looking to generate a page for each of these files. The page generated will contain the icon tiled over the entire page with a margin on the side. I am looking for 1/2 inch tiles with 1/2 margins around the page which needs to be a US Letter (8 1/2 x 11 inch).
After reading a lot of the documentation this is what I came up with.
colors=(red blue purple yellow green black)
mkdir -p generated/icons/
for color in ${colors[#]}; do
images=`printf "source/icons/${color}.svg%.0s " {1..300}`
montage $images -tile 15x20 -page Letter+1+1 -units PixelsPerInch -density 2550x3300 \
generated/icons/${color}.pdf
done
So for each of my files I run montage. I use printf to repeat the image file name 300 times. I then tile this 15x20 times. 15x20 comes from 8.5 minus 1 inch margins = 7.5*2 = 15 and likewise (11-1)*2 = 20. 300 images come from 15*20. I then say I want this on a letter page offset 1x1. (This was my attempt at a margin) I say I am speaking in pixel per inch (but none of the units seem to match up). I set the dpi to 300 by the density command where 8.5*300 = 2550 and 11*300 = 3300.
I've been toying with other settings (geometry etc.) but none of these are working. And the units don't seem to make sense either... Right now my resultant pdf is a square etc...
How do I make tiled pages as such? Also is there a way for me to do this more efficiently? What I have thus far is very slow.
EDIT:
Some more information:
i:montage --version
Version: ImageMagick 6.8.8-10 Q16 x86_64 2015-03-10 http://www.imagemagick.org
tile image:
my current output:
Notice margins not right, is square not a letter page, also tiles as skewed
Given the PNG image you provided, and I presume you want a 1 inch border of white all around inside an 8.5x11 inch printed image. Thus the tiled width would be 7.5 inches and tiled height would be 10 inches.
1 in = 300 dpi so border thickness = 300 px = 2 tiles thick
11-1 = 10 inches tall for tiled region height = 10*300 = 3000 px
8.5-1 = 7.5 inches wide for tiled region width = 7.5*300 = 2250 px
1 tile = 0.5 inches at 300 dpi = 0.5*300 = 150 px
convert lUDbK.png -resize "150x150!" -write mpr:tile +delete -size 2250x3000 tile:mpr:tile -bordercolor white -border 300 -units pixelsperinch -density 300 tiled_page.png
Time to process was 1.75 sec on my Mac Mini.
This produces an image which is rather large. You will have to extract the image to see the border, since this page background is white.
(Note that PNG only supports pixelspercentimeter, but IM converts my specification of pixelperinch accordingly. So if you look at the meta data, it will probably show you some other density in units of pixelspercentimeter. But they will correspond to the desired 300 dpi.)

GraphicsMagick crop PDF

I've got a 8.5x11 PDF at 300dpi. It has a single UPC label in the top left corner of the PDF. Imagine that there could be 30 labels on a 1 sheet, but we just have 1 label.
I'm trying to crop the PDF to be just the size of the 1 label. So far I've got this
gm convert -density 300 single.pdf out.pdf
Which doesn't do any cropping. When I crop to say 300x100 it makes a 20MB file with 30000 pages.
I have not a clue how to use -crop to actually crop to the correct size. I need it to be 3.5inches by 1.125 inches.
Using the following input PDF (here converted to a PNG):
the following command will crop the label:
gm wiz.pdf -crop 180x50+1+1 cropped.pdf
This label is sized 180x50 pixels.
For an 8.5x11in PDF at 300 PPI you'd have a 2450x3300 pixels PDF (which I doubt you do, but that's another question) and you'd need to use -crop 1050x337+0+0 (more exactly, 1050x337.5+0+0 -- but you cannot crop half pixels!).
Note, the +0+0 part crops the top left corner. If you need offset to the right by N pixels and to the bottom by M pixels use +N+M...
Using ImageMagick instead...
You could also use ImageMagick's convert command:
convert wiz.pdf[180x50+1+1] cropped.pdf
Comment about image sizes...
One additional comment about this remark:
"I have not a clue how to use -crop to actually crop to the correct size."
There is no other real size for raster images than pixels. ABC pixels wide and XYZ pixels high...
There is no such thing as an absolute, real size for a digital image that you can measure in inches... unless you additionally can state the resolution at which a given image is rendered on a display or a print device!
An 8.50x11in sized image at 300 PPI will translate to 2550x3300 pixels.
However, if your image does not contain this amount of pixels (which is the real, absolute size of any raster image), you may still be able to render it at 300 PPI -- but its size in inches will be different from 8.5x11in!
So, whenever you want to crop, use the absolute number of pixels you want. Don't use resolution/density at all on your command line!

Check if PDF is colored or grayscale or black&white [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
What are the ways to check if a PDF file is colored or grayscale or black/white?
You can use Ghostscript's inkcov device to get color information about each PDF page. Here is an example command for a sample PDF (cmyk.pdf) of mine with its output:
gs -o - -sDEVICE=inkcov cmyk.pdf
GPL Ghostscript 9.10 (2013-08-30)
Processing pages 1 through 5.
Page 1
0.00000 0.00000 0.00000 0.02231 CMYK OK
Page 2
0.02360 0.02360 0.02360 0.02360 CMYK OK
Page 3
0.02525 0.02525 0.02525 0.00000 CMYK OK
Page 4
0.00000 0.00000 0.00000 0.01983 CMYK OK
Page 5
0.13274 0.13274 0.13274 0.03355 CMYK OK
If you add the -q parameter, the result is this:
gs -q -o - -sDEVICE=inkcov cmyk.pdf
0.00000 0.00000 0.00000 0.02231 CMYK OK
0.02360 0.02360 0.02360 0.02360 CMYK OK
0.02525 0.02525 0.02525 0.00000 CMYK OK
0.00000 0.00000 0.00000 0.01983 CMYK OK
0.13274 0.13274 0.13274 0.03355 CMYK OK
How to interprete these numbers?
Each column represents a color, from left to right: Cyan (C), Magenta (M), Yellow (Y) and Black (K).
A value of 0.00000 represents zero color used.
A value of 1.00000 would mean 100% coverage with the respective color for the sheet.
The value of 0.02360 for each single ink color on page 2 means: each color covers 2.36% of the full page (including Black).
You can see the values for page 1: the same value, 0.00000, for Cyan, Magenta and Yellow, but 0.02231 for Black. This means: page 1 uses black ink only, and 2.231 % of the pages area is covered by black ink.
Take page 2: here each of the 4 inks is given with a value of 0.02360. Each ink is covering 2.36 % of the full page.
Look also at the values for page 3: 0.02525 for C, M and Y and 0.00000 for Black. So this page does not use black ink at all, but uses the same mount of each colored ink to cover an identically sized area of 2.525 % of the full page.
Page 4: result is similar to page 1.
Page 5: See yourself...
Caveats:
The inkcov device does always print CMYK values, never RGB values. The reason for this is that it converts all RGB color shades into CMYK before analysing the color coverage of pages. This of course introduces some inaccuracies (which you have to take into account before your rely on this tool).
You need to use a version of Ghostscript 9.05 or later (if you're on MS Windows: v9.07 or later). Previous versions did not have the inkcov device.
You certainly will come across PDF pages which do not appear to contain color but only gray shades when viewed in a PDF viewer or when printed on paper. This is because gray shades can be composed by using equal amounts of different colors.
Update
The following picture roughly reproduces the 5 PDF pages of above used cmyk.pdf. This should give you an approximate impression how they look like in a PDF viewer. It should make it easier to comprehend how the different values for the ink coverage quoted above do add up:
Here is the Ghostscript command that I originally used to create the above used cmyk.pdf:
gs \
-o cmyk.pdf \
-sDEVICE=pdfwrite \
-g5950x2105 \
-c "/F1 {100 100 moveto /Helvetica findfont 42 scalefont setfont} def" \
-c "F1 (100% 'pure' black) show showpage" \
-c "F1 .5 .5 .5 setrgbcolor (50% 'rich' rgbgray) show showpage" \
-c "F1 .5 .5 .5 0 setcmykcolor (50% 'rich' cmykgray) show showpage" \
-c "F1 .5 setgray (50% 'pure' gray) show showpage" \
-c " 1 0 0 0 setcmykcolor 100 130 64 64 rectfill" \
-c " 0 1 0 0 setcmykcolor 200 130 64 64 rectfill" \
-c " 0 0 1 0 setcmykcolor 300 130 64 64 rectfill" \
-c " 0 0 0 1 setcmykcolor 400 130 64 64 rectfill" \
-c " 0 1 1 0 setcmykcolor 100 30 64 64 rectfill" \
-c " 1 0 1 0 setcmykcolor 200 30 64 64 rectfill" \
-c " 1 1 0 0 setcmykcolor 300 30 64 64 rectfill" \
-c " 1 1 1 0 setcmykcolor 400 30 64 64 rectfill showpage"
The traditional way of doing this would be to use a preflight tool such as the tools from callas software (Caution: I'm associated with this company). But if this aspect of the PDF is the only aspect you want to check, that's probably going to be overkill.
I would think that the second possible approach would be to use a tool that can convert a PDF to images and then analyse the images (convert to a CMYK image - then see if there is anything on the C, M or Y channels in that generated image).
Amyn,
This is Mohammad from LEADTOOLS support. I noticed that you posted a similar question on our LEADTOOLS support forums. I have already posted a reply there and here is a slightly modified copy of that reply:
/******************************************/
If the PDF page contains only black text on white background, loading it using the default settings will produce gray shades around the text edges to give them better smooth display as shown in attached image.
If you want such black text to be rasterized as pure black without gray shades, change the settings before loading using LEADTOOLS v18 as follows:
Set the UsePdfEngine property of the loading PDF options to true like this:
RasterCodecs.Options.Pdf.Load.UsePdfEngine = true;
Set the TextAlpha property of the loading PDF options to 1 like this:
RasterCodecs.Options.Pdf.Load.TextAlpha = 1;
Load the PDF file using default bits per pixel (24-bits):
RasterCodecs.Load("BlackTextWhiteBackground.pdf");
Count the unique colors in the file using the ColorCountCommand Class function. If the number of colors is more than two, the image will not be black and white. This could happen if it contains non-black text or other color images or graphics objects:
ColorCountCommand MyCommand = new ColorCountCommand();
MyCommand.Run(_viewer.Image);
Make sure that the "Leadtools.PdfEngine.dll" is placed in the output folder of your project (next to the EXE).
/******************************************/
Edit to answer comment about detecting gray page:
It is possible to tell whether the page is color or purely shades of gray.
Add the following code after loading as 24-bits and counting the colors:
if (MyCommand.ColorCount > 2 && MyCommand.ColorCount <= 256) //could be gray
{
ColorResolutionCommand colorRes = new ColorResolutionCommand(ColorResolutionCommandMode.InPlace, 8,
RasterByteOrder.Bgr,RasterDitheringMethod.None, ColorResolutionCommandPaletteFlags.Optimized, null);
colorRes.Run(_viewer.Image);
if(_viewer.Image.GrayscaleMode == RasterGrayscaleMode.None)
MessageBox.Show("image is NOT grayscale");
else
MessageBox.Show("image is grayscale, its mode is: " + _viewer.Image.GrayscaleMode);
}