How to apply custom halftone to ghostscript conversion of PDF file to bitmap - pdf

I have a PDF file with a grayscale image that I'm trying to convert to monochromatic bitmap (1 bit per pixel) using ghostscript. I can get everything to work fine, but I don't like the way the default grayscale conversion looks with coarse lines going through it. My understanding is that I can customize the halftone algorithm to my liking, but I can't seem to get the postscript commands to have an effect on the output. Am I using the 'sethalftone' command incorrectly? Is there a different approach I should consider?
gs -sDEVICE=bmpmono -sOutputFile=test.bmp -dBATCH -dNOPAUSE -r300 -c "<< /HalftoneType 1 /Frequency 40 /Angle 0 /SpotFunction {pop}>> sethalftone" -sPageList=1 input.pdf
I can completely remove the "-c" command line parameter and it makes no difference.
This is what the current mono conversion looks like that I'm trying to improve upon:

Using the default halftone built into Ghost Script has a limited working range and a tendency to provide an appearance leaning to one side, (it is deliberate design not accidental) there is a dither control but that tends to work effectively at r/20 to r/5 in this case showing -dDITHERPPI=25 to -dDITHERPPI=60 thus reducing the blocks in size. However there are other potential controls.
I am no expert on how these settings are best used (plagiarized) we would need to read the PS manual, or play with their values, but they seem to fare better.
gswin32c -sDEVICE=bmpmono -r300 -o out.bmp -c "<< /Frequency 133 /Angle 45 /SuperCellSize 36 /Levels 255 >> .genordered /Default exch /Halftone defineresource { } settransfer 0.003 setsmoothness" -f ../examples/text_graph_image_cmyk_rgb.pdf
(note you may wish to include -dUseFastColor before -c "....)
Changing values
/Angle common values may be 0 (0=GS default) 22 30 45 52 60 67 however /Angle 45 is most common for this mono conversion, and variations coupled with a range of /Frequency (Line Screen Frequency ~= LPI) and SuperCell size may produce unachievable or sub normal result.
/Frequency (75=GS default) can be set to different values but a common ratio is 4/3 = 133 (%) another is 170 however a rule of thumb is r/2 thus if your using 300dpi /Frequency 150 may be better in this case. The correct choice is down to a complex consideration of factors, so your grid at 45 degrees is spaced at sqrt(2)=1.41421356237 at an angle. Thus the horizontal & vertical effect is 0.7071....(in terms of page dpi) thus to match 300 dpi the LPI frequency needs to be possibly made relative so consider 212
/SuperCellSize eludes me Integer; default value = 1 -- actual cell size determined by Frequency, Angle, H/V Resolution. A larger value will allow more levels to be attained. Unclear as to what may be optimum without sample of target, so try a few values but some will trigger error feedback try around 32 or 64 as a reasonable max start point. note example uses 36 ?
To prevent the undesirable moire patterns, Peter Fink proposed the Supercell Method named the Adobe Accurate Screen to realize the screen
angle having RT with the required accuracy.3 The supercell designed in the image domain including m × m (m: integer) identical halftone cells is one solution to approximate accurately the required screen angle and screen ruling. https://www.imaging.org/site/PDFS/Papers/1998/PICS-0-43/668.pdf
You may also wish to try
/DotShape Integer; default value = 0 (CIRCLE). Other shapes available are:
1=REDBOOK, 2=INVERTED, 3=RHOMBOID, 4=LINE_X, 5=LINE_Y, 6=DIAMOND1, 7=DIAMOND2, 8=ROUNDSPOT,
These values and more can be found at https://ghostscript.com/doc/current/Language.htm
Upper Left is a mono area dithered in a graphics app where depending on resolution and density "worms" or "moire" patterns start to appear. To regulate such effects a deliberate screen is applied and at smallest units aberration's in the linework pattern will not be noticeable, unless zoomed beyond intended scrutiny.
gswin32c -sDEVICE=bmpmono -r300 -o out.bmp -c "<< /Frequency 106 /Angle 45 /SuperCellSize 64 /Levels 255 >> .genordered /Default exch /Halftone defineresource { } " -f test.pdf

Imagemagick can use custom halftones, defined by XML files, so if you can't get the result you need directly with GhostScript, another option is to use GhostScript to output a high-resolution greyscale image, and use Imagemagick to produce the black and white halftoned image.

Related

I need detect the approximate location of QR code in scanned image (PDF converted to PNG)

I have many scanned document in PDF.
I use ImageMagick with Ghostscript to convert PDF to PNG in big density. I use convert -density 288 2.pdf 2.png. After that I read the pixels with PHP and find where is QR code and decode it. Because image is very big (~ 2500px), it's need very much RAM. I want, before I read pixels with PHP, to crop the image with ImageMagick and leave only that part with the QR code.
Can I detect the approximate location of QR code with ImageMagick, crop and leave only that part ?
Sample PDF
Converted PNG
Further Update
I see your discussion with Kurt about better extraction of the image from the PDF in the first place, and his recommendation was to use pdfimages. I just wanted to add that you won't find that if you do brew search pdfimages, but you actually need to use
brew install poppler
and then you get the pdfimages executable.
Updated Answer
If you change the tile size to 100x100 on the crop command and run this for the second PDF you supplied:
convert -density 288 pdf2.pdf -crop 100x100 tile%04d.png
and then use the same entropy analysis command
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
...
...
0.84432:+600+3100:tile0750.png
0.846019:+600+2800:tile0678.png
0.980938:+700+400:tile0103.png
0.984906:+700+500:tile0127.png
0.988808:+600+400:tile0102.png
0.998365:+600+500:tile0126.png
The last 4 listed tiles are
Likewise for the other PDF file you supplied, you get
0.863498:+1900+500:tile0139.png
0.954581:+2000+500:tile0140.png
0.974077:+1900+600:tile0163.png
0.97671:+2000+600:tile0164.png
which means these tiles
I would think that should help you pretty much approximately locate the QR code.
Original Answer
This is not all that scientific, but it may help you get started. The key, I think, is the entropy of the various areas of the image. The QR code has a lot of information encoded in a small area so it should have high entropy. So, I use ImageMagick to split the image into square 400x400 tiles like this:
convert image.png -crop 400x400 tile%03d.png
which gives me 54 tiles. Then I calculate the entropy of each of the tiles and sort them by increasing entropy, also outputting their offsets from the top left of the frame, and their name, like this:
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
0.00408949:+1200+2800:tile045.png
0.00473755:+1600+2800:tile046.png
0.00944815:+800+2800:tile044.png
0.0142171:+1200+3200:tile051.png
0.0143607:+1600+3200:tile052.png
0.0341039:+400+2800:tile043.png
0.0349564:+800+3200:tile050.png
0.0359226:+800+0:tile002.png
0.0549334:+800+400:tile008.png
0.0556793:+400+3200:tile049.png
0.0589632:+400+0:tile001.png
0.0649078:+1200+0:tile003.png
0.10811:+1200+400:tile009.png
0.116287:+2000+3200:tile053.png
0.120092:+800+800:tile014.png
0.12454:+0+2800:tile042.png
0.125963:+1600+0:tile004.png
0.128795:+800+1200:tile020.png
0.133506:+0+400:tile006.png
0.139894:+1600+400:tile010.png
0.143205:+2000+2800:tile047.png
0.144552:+400+2400:tile037.png
0.153143:+0+0:tile000.png
0.154167:+400+400:tile007.png
0.173786:+0+2400:tile036.png
0.17545:+400+1600:tile025.png
0.193964:+2000+400:tile011.png
0.209993:+0+3200:tile048.png
0.211954:+1200+800:tile015.png
0.215337:+400+2000:tile031.png
0.218159:+800+1600:tile026.png
0.230095:+2000+1200:tile023.png
0.237791:+2000+0:tile005.png
0.239336:+2000+1600:tile029.png
0.24275:+800+2400:tile038.png
0.244751:+0+2000:tile030.png
0.254958:+800+2000:tile032.png
0.271722:+2000+2000:tile035.png
0.275329:+0+1600:tile024.png
0.278992:+2000+800:tile017.png
0.282241:+400+1200:tile019.png
0.285228:+1200+1200:tile021.png
0.290524:+400+800:tile013.png
0.320734:+0+800:tile012.png
0.330168:+1600+2000:tile034.png
0.360795:+1200+2000:tile033.png
0.391519:+0+1200:tile018.png
0.421396:+1200+1600:tile027.png
0.421421:+2000+2400:tile041.png
0.421696:+1600+2400:tile040.png
0.486866:+1600+1600:tile028.png
0.489479:+1600+800:tile016.png
0.611449:+1600+1200:tile022.png
0.674079:+1200+2400:tile039.png
and, hey presto, the last one listed (i.e. the one with the highest entropy) tile039.png is this one.
I have drawn a rectangle around its location using this command
convert image.png -stroke red -fill none -strokewidth 3 -draw "rectangle 1200,2400 1600,2800" a.jpg
I concede there may be luck involved, but I only have one image to test my mad theories. You may need to tile twice, the second time with an x-offset and y-offset of half a tile width, so that you don't cut the QR code and split it across 2 tiles. You may need different size tiles for different size barcodes. You may need to consider the last 3-5 tiles located for your next algorithm. But I think it could form the basis of a method.

Raw pdf color conversion (with known conversion formula) from RGB to CMYK

This question is related to
Script (or some other means) to convert RGB to CMYK in PDF?
however way more specific. Consider that I am not an expert in print production ;)
Situation: For printing I am only allowed to use two colors, Cyan and Black. The printery requests the final PDF to be in DeviceCMYK with only the Channels C and K used.
pdflatex automatically does that (with the xcolor package) for all fonts and drawn objects, however I have more than 100 sketches/figures in PDF format which are embedded in the manuscript. Due to an admittedly badly designed workflow (late realization that Inkscape cannot export CMYK PDFs), all these figures were created in Inkscape, and thus are RGB PDFs.
However, the only used colors within Inkscape were RGB complements of CMY(K), e.g. 100% Cyan is (0,255,255) RGB and 50% K is (127,127,127) etc.
Problem: I need to convert all these PDF figures from RGB to DeviceCMYK (or alternatively the whole PDF of the final manuscript) with a specific conversion formula.
I did a lot of google research and tried the often suggested ways of using e.g. Ghostscript or various print production tools in Adobe Acrobat, however all of the conversion techniques I found so far wanted to use ICC color profiles or used some other conversion strategy which filled the channels MY and spared some C and K, for example.
I know the exact conversion formula for the raw color numbers from our Inkscape-RGBs to the channels C and K, however I do not know or find any program or tool that allows me to manually specify conversion formulas.
Question: Is there any workflow to convert my PDFs from RGB to C(MY)K manually with my own specific conversion formula for the raw numbers with the converted PDF being in DeviceCMYK using a tool, script or Adobe product?
Due to the large number of figures I would prefer a batched solution which doesn't require too much coding from my side, but if it should be the only solution, I'd also be open minded for a workflow like "load/convert/save" within a program for every single figure or writing a small program with an easy-to-handle C++ PDF API for example.
Limitations and additional info: A different file format (like TikZ figures) is not possible any more since it does not work perfectly and the necessary adaptions to the figures would create too much overhead. A maybe helpful information: Since the figures are created in Inkscape, there are no raster images within the PDFs. I also do not want all figures to be converted to raster images during the color conversion.
Edit:
I have created an example of a RGB PDF-figure created with inkscape.
I also did a manual object-by-object color conversion to a CMYK-PDF with Illustrator, to show how the result should look like. Illustrator stores the axial shading in a DeviceN colorspace with the colors cyan and black, which is close enough^^
Here is an idea, I think it will work if your PDF files are using exclusively the colorspaces DeviceGray, DeviceRGB and DeviceCMYK:
1- Convert all your PDF files to Postscript (with pdf2ps from ghostscript for example)
2- Write a Postscript program that redefines the operators setrgbcolor, setgray and setcolor with your own implementation in the Postscript language, your implementation will internally use setcmykcolor and it will compute the values using your custom formula.
Here is an example for redefining the setgray operator:
% The operator setcmykcolor expects 4 values in the stack
% When setgray is called, we can expect to have 1 value in the stack, we will
% use it for the black component of cmyk by adding 3 zeros and rolling the
% top 4 elements of the stack 3 times
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
3- Paste your Postcript program at the begining of each resulting ps file from step 1.
4- Convert all your files back to PDF (with ps2pdf for example)
See it in action by saving this piece of code as sample.ps:
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
0.5 setgray
0 0 moveto
600 600 lineto
stroke
showpage
Convert it to PDF with ghostscript using this command line (I used version 9.14):
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=sample.pdf sample.ps
The resulting PDF will have the following page content:
q 0.1 0 0 0.1 0 0 cm
/R7 gs
10 w
% The K operator is the PDF equivalent of setcmykcolor in postscript
0 0 0 0.5 K
0 0 m
3000 3000 l
S
Q
As you can see, the ps-> pdf conversion will preserve the cmky colors specified in postscript with the setcmykcolor operator.
Maybe you can post your formula as a new question and someone could help you out translating it to postscript.
Since you have access to Illustrator, you might want to try importing the PDF into Illustrator and using Illustrator's scripting capabilities to iterate over the elements and replace fill/stroke RGB colors with their CMYK replacement colors.
The difficulty will be with the shading patterns (Gradients) used in the PDF; if they are imported as GradientColor, then in theory it's a matter of digging into the GradientColor to find the base RGB colors and substitute their CMYK replacement.
A very similar problem was solved using the ActivePDF.dll with C++ (or C#??).

how to change all colours in a PDF to their respective complimentary colours; how to make a PDF negative

How can all colours in a PDF be changed to their compliments? So, I mean that a document consisting of black text on a white background would be changed to a document consisting of white text on a black background. Any red colours in a document would be changed to turquoise colours and so on. Is there some standard utility that could be used for this purpose or am I likely to have to contrive some awkward ImageMagick image conversions?
EDIT: Here's a very manual way of doing this using ImageMagick:
convert -density 300 -quality 100 "${fileName}" tmp.png
mogrify -flatten *.png
mogrify -negate *.png
convert *.png "${fileName}"_1.pdf
EDIT: I changed the wording for the purposes of clarity.
I can think of at least 3 ways to invert (negate, compliment) colors of a PDF page description -- I mean treating page content as a black box, therefore not counting direct diving into page content and messing around as per Dingo's answer. Unfortunately, ready-made free tools (Ghostscript, mainly) provide incomplete solution and require manual intervention.
Note, that all specific terms used below require at least some knowledge of basics of PDF and Postscript Language References and are presented here somewhat as a simplification, please refer to manuals or google for thorough description.
The most obvious method is to use inverting Transfer function. Transfer
function (TF) expects an argument in the range 0..1 which is (additive) color
component, and returns color value, too. Negating TF is, of course, {1 sub neg} and is easy to inject:
gs -q -sDEVICE=pdfwrite -o out.pdf -c '{1 sub neg} settransfer' -f in.pdf
That's great, and Adobe Reader displays our out.pdf as (see below) negated. But here 'greatness' ends. All other viewers ignore TF, (probably) considering it to be device-dependent and actually present in PDF as a compensation for output device pecularities (non-linear printer response etc.) and therefore something to be ignored when displaying PDF on-screen. Further, depending in Reader's version, negation of black-on-white text leads to either white-on-black or yellowish-on-black text. And that's not great.
Therefore, we need not only TF injection, but the way to properly apply TF to
PDF content before viewing. And, regardless of ps2pdf Ghostscript's manual saying:
Currently, the transfer function is always applied
(of three options: Apply, Preserve, Remove)
using current 9.10 version, I couldn't make Ghostscript to actually apply TF (i.e. modify page description operators) when outputting to high-level (pdfwrite, as opposing to image output) devices. Maybe I'm missing something here.
But, Adobe Distiller, with proper options set, does apply TF to input postscript file.
Somewhat related to TF is the use of inverting Device-Link color profiles,
which are simple identity DL profiles with inverting (input or output) curves.
That's an interesting use of interesting technology, but, again, Ghostscript currently doesn't support proper Color Management (and DL profiles) in PDF-2-PDF workflows. Moreover, Adobe Acrobat doesn't know what to do with DL profiles, their use within Acrobat requires expensive third-party plugins.
If PDF viewer (renderer) claims to support 1.4 and transparency (they all do, nowadays), that's another way to go. PDF Reference says, that if current Blending Mode is Difference and we paint with white, it effectively means inverting backdrop. So, we explicitly paint background with white (if there's no background then there's nothing to invert), then put our current content (treating it as black box), then set Blending Mode to Difference and paint on top with white. Is that clear? Again, I had no success setting Blending Mode using Ghostscript, with:
[ /BM /Difference /SetTransparency pdfmark
It works OK with Distiller but is ignored by Ghostscript. Maybe (again) I'm missing something.
OK, to round up (the answer's getting somewhat long), here's Perl solution for 3d method using proper API (programming site, isn't it. Any programming language and appropriate API will do):
use strict;
use warnings;
use PDF::API2;
use PDF::API2::Basic::PDF::Utils;
my $pdf = PDF::API2->open('adobe_supplement_iso32000.pdf');
for my $n (1..$pdf->pages()) {
my $p = $pdf->openpage($n);
$p->{Group} = PDFDict();
$p->{Group}->{CS} = PDFName('DeviceRGB');
$p->{Group}->{S} = PDFName('Transparency');
my $gfx = $p->gfx(1); # prepend
$gfx->fillcolor('white');
$gfx->rect($p->get_mediabox());
$gfx->fill();
$gfx = $p->gfx(); # append
$gfx->egstate($pdf->egstate->blendmode('Difference'));
$gfx->fillcolor('white');
$gfx->rect($p->get_mediabox());
$gfx->fill();
}
$pdf->saveas('out.pdf');
Here I take one of Adobe documents and invert it.
What's important: page should have transparency blending space set to RGB explicitly, because Adobe Reader defaults to CMYK, and inverting colors in CMYK you probably don't want. Pure CMYK black 0-0-0-100 inverts to 100-100-100-0, that's (nearly) black, too. RGB black gives something like 70-60-50-70 CMYK that inverts to brown 30-40-50-30, and you don't want that. That's why I add Group entry to pages dictionaries.
Your question seems to be very similar to this:
Change background color of pdf
but you also want to change the colour of the text.
so you can follow the workflow I suggested some time ago for the same task:
----------------
vector pdf background (meaning not raster image) in pdf files can be
easily changed in a couple of steps (see also my stackoverflow answer that
now I'll extend and improve
Change background color of pdf
PRELIMINAR CHECK:
open your pdf file with an editor able to show the internal pdf structure,
like
notepad++
- http://notepad-plus-plus.org/download/v6.1.8.html
and verify if you can see code snippets like
0.000 0.000 0.000 rg (it means *black*)
1.000 1.000 1.000 rg (it means *white*)
and so on...
(code snippet can change, for instance, in pdf produced by openoffice
internal pdf exporting feature, the same code snippepts are in this forms:
0 0 0 rg (it means *black*)
1 1 1 rg (it means *white*)
and so on...
if you are able to see these code snippets, then you can start to change
values, otherwise, you need to decompress text streams
you can perform this task with
pdftk
http://www.pdflabs.com/docs/install-pdftk/
pdftk file.pdf output uncompressed.pdf uncompress
and recompress after finished changes
pdftk uncompressed.pdf output recompressed.pdf compress
now, if you see these code snippets, you can change values
STEP 1 (for pdf editing) -
the first thing you need is to find the right equivalence between RGB
color values of text and background and the internal pdf represerntation
of same colors
Since it seems you are a windowsian inhabitant from the third planet in
the Microsoft constellation, you can use a free color picker like this
http://www.iconico.com/download.aspx?app=ColorPic&type=free
to identify the rgb values of text and background colors
once you have these values, you need to convert into special internal pdf
representation
to do this take i mind this proportion:
1:255=x:color you selected
for instance: let say you have this RGB triplet for background:
30,144,255
to know correspondent values in pdf in order to insert in code snippet to
change pdf background color, you do: (you can use http://
www.wolframalpha.com/ to compute with precision)
1:255=x:30 = 30/255 = 0.117 (approximated to first three decimals)
1:255=x:144 = 144/255 = 0.564 (approximated to first three decimals)
1:255=x:255 = 255/255 = 1
so, the whole triplet in pdf, corresponding to RGB 30,144,255, will be:
0.117 0.564 1.000
STEP 2 (for pdf editing)
we look for 0.117 0.564 1.000 in pdf file with notepad++ (wrap around
and match one word only need to be checked) and we found the internal
pdf representation of background and we can change from azure to, let say,
white
1.000 1.000 1.000
or
1 1 1
but, since you wrote about black background, to be more precise, I
created a sample pdf with white background and black text
http://ge.tt/1N7Vuz91/v/0
since we know that 0.000 0.000 0.000 rg means black, we look for this
and we can change from 0.000 0.000 0.000 rg, to 1.000 1.000 1.000 rg
(white) BUT...
at same time, if, your text is black, nd you want change its color to white, you need also to change first the text from black to other color, otherwise it will be invisible, white on white
so, we cannot simply change directly white background to black, at once,
since doing this, we have not a difference between color text and
background values
and then we act as follows:
we change white background from 1.000 1.000 1.000 into something like
0.5 0.5 0.5 (light grey)
http://ge.tt/1N7Vuz91/v/1 (resulting pdf - intermediate step)
then looking for
0.000 0.000 0.000 (black text) and change to **white**
1.000 1.000 1.000
resulting intermediate pdf file:
http://ge.tt/1N7Vuz91/v/2
finally, we change again the color of background from
0.5 0.5 0.5 (light grey)
to black
0.000 0.000 0.000
and we have now a vector pdf with white text and black background
http://ge.tt/1N7Vuz91/v/3
please, remember to
1 - compress again this pdf you mmodified if you uncompressed with pdftk
2 - repair
pdftk file.pdf output fixed.pdf
there is another way, starting from postscript, to perform the same task,
but being you a windowsian, I guess the postscript way is the harder way
for you, but if someone (a linuxian from Torvald constellation) is
interested I can explain how do the same thing in postscript
not in this post to avoid to be too verbose
give a feedback, please, and feel free to ask more

How to correctly crop PDF with uneven text margins

I have PDF like this:
where all margins relative to text content are different on per page basis.
Is there any tool that can correct this for me?
I know Scan Tailor can do this on bitmap, but this is PDF with just text layer, so I'm not after solution that would involve bitmaps at any stage
Update:
OK, for me there is no need to try to run PDFCrop on Windows, as main feature is provided by ghostscript. This command (taken from pdfcrop perl script):
gswin32c.exe -dSAFER -dNOPAUSE -dBATCH -q -r72 -sDEVICE=bbox -f input.pdf 2> bbox.txt
produces bbox.txt file, with text content dimensions, as if there are no margins (bounding box). It looks like this:
%%BoundingBox: 91 259 474 757
%%HiResBoundingBox: 91.000000 259.000000 474.000000 757.000000
%%BoundingBox: 85 224 470 768
%%HiResBoundingBox: 85.000000 224.000000 469.375000 768.000000
%%BoundingBox: 102 217 489 768
%%HiResBoundingBox: 102.000000 217.000000 488.457031 768.000000
...
where first to numbers are lower left corner x,y values and rest two and upper right, measuring from lower left edge (in pixels/points).
This can be read by user's language of choice and then bboxes corrected as desired and passed again to ghostscript as i.e. referenced here: Cropping a PDF using Ghostscript 9.01
If you are sure that only text is involved (and not images with text drawn on it or paths drawing symbols), you can quite easily build such a tool in Java using iText (or most likely also some .NET language using iTextSharp) using the parser package functionality.
The book iText in Action, 2nd edition, in chapter 15.3.4 shows how to find the text margins, and the sample code can be found in ShowTextMargins.java in the SourceForge iText SVN repository.
By manipulating the MediaBox entries of the individual pages you can then adapt the margins as desired.

Convert multipage PDF to PNG and back (Linux)

I have a lot of PDF documents that I want to convert to PNG, edit in Gimp, and then save back to the multipage Acrobat file. I'm filling out forms and adding scanned signature, trying to avoid printing, signing, then scanning back in, with the ability to type the information I need to enter.
I've been trying to use Imagemagick to convert to png files, which seems to work fine. I use the command convert -quality 100 -density 300x300 multipage.pdf single%d.png
(I'm not really sure if the quality parameter is right for png).
But I'm having problems with saving back to PDF. Some of the files have the wrong page size, and I've tried every command and procedure I can find, but there are always a few odd sizes. The resolution seems to vary so that it looks good at a certain zoom level, but either a few pages are specified at about 2" wide, or they are 8.5x11 but the others are about 35" wide. I've tried making sure Gimp had the canvass size and resolution correct, and to save the resolution in the file, but that doesn't seem to matter.
The command I use to save the files is convert -page letter -adjoin single*.png multipage.pdf I've tried other parameters, but none seemed to matter.
If anyone has any ideas or alternatives, I'd appreciate it.
"I'm not really sure if the quality parameter is right for PNG."
For PNG output, the -quality setting is very unlike JPEG's quality setting (which simply is an integer from 0 to 100).
For PNG it is composed by two single digits:
The first digit (tens) is (largely) the zlib compression level, and it may go from 0 to 9.
(However the setting of 0 has a special meaning: when you use it you'll get Huffman compression, not zlib compression level 0. This is often better... Weird but true.)
The second digit is the PNG data encoding filter type (before it is compressed):
0 is none,
1 is "sub",
2 is "up",
3 is "average",
4 is "Paeth", and
5 is "adaptive".
In practical terms that means:
For illustrations with solid sequences of color a "none" filter (-quality 00) is typically the most appropriate.
For photos of natural landscapes an "adaptive" filtering (-quality 05) is generally the best.
"I'm having problems with saving back to PDF. Some of the files have the wrong page size, and I've tried every command and procedure I can find [...] but either a few pages are specified at about 2" wide, or they are 8.5x11 but the others are about 35" wide."
Not having available your PNG files, I created a few simple ones with different dimensions to verify the different commands (as I wasn't sure myself any more). Indeed, the one you used:
convert -page letter -adjoin single*.png multipage.pdf
does create all PDF pages in (same) letter size, but it places my sample of (differently sized) PNGs always on the lower left corner of the PDF page. (Should a PNG exceed the PDF page size, it does scale them down to make them fit -- but it doesn't scale up smaller PNGs to fill the available page space.)
The following modification to the command will place the PNGs into the center of each PDF page:
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
multipage.pdf
If this is still not good enough for you, you can enforce a (possibly non-proportional!) scaling to almost fill the letter area by adding a -scale '590!x770!' parameter (this will leave a border of 11 pt at each edge of the page):
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
-scale '590!x770!' \
multipage.pdf
To leave away the extra border, use -scale '612!x792!'. -- Should you want only upward scaling to happen if required while keeping the aspect ratio of the PNG, use -scale '590<x770<':
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
-scale '590<x770<' \
multipage.pdf
Why not just use Xournal? That's what I use to annotate PDFs