PDF to EPS or PS to EPS conversion maintaining page size - pdf

I need to convert a PDF or Postscript file to EPS, I tried using Ghostscript with the following command to convert Postscript to EPS:
gswin32.exe -o output.eps -sDEVICE=eps2write -dFitPage input.ps
Or PDF to EPS:
gswin32c.exe -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -o output.eps -dFitPage input.pdf
They both complete successfully but they are not maintaining the page size. The input PDF or PS files are the same drawings and they both a page size of 300x300pts. You can download these files here and here. They look like this:
But after converting them to EPS the results are these, PS to EPS and PDF to EPS. They look like this, the first one is the result from PS to EPS and the second one is the result from PDF to EPS (they are opened using EPS Viewer that rasterizes the image that's the reason for the low quality):
As you can see, none of them have the original 300x300 pts size, I've tried many Ghostscript options but I can't manage to get an EPS with the right Bounding Box. I just need to convert a PDF OR PS to EPS, whatever is easier or gives better results.

What you are asking for is, more or less, the exact opposite of what is normally required.
In general people want the EPS Bounding Box to be as tight as possible to the actual marks made by the EPS, because the normal use for an EPS file is to 'embed' it in another document. If you want extra white space you would normally add it around the EPS when you embed it.
Indeed, the EPS specification says that the BoundingBox comment should not include the white space. On page 8 of the EPSF specification:
"For an EPS file, the bounding box is the smallest rectangle that encloses all the marks painted on the single page of the EPS file"
Messing with Ghostscript switches isn't going to do anything helpful for you here, the device explicitly records the marks that are made by the input, and sets the BoundiongBox from those.
Perhaps if you were to explain why you want to have an EPS file with incorrect BoundingBox comments it would be possible to make some suggestions, but Ghostscript is doing exactly what it should do here.
[addendum]
(see comment below, this is in reply)
I suspect you need to change your process in some way then. One solution is to have the PDF start by filling the entire page with white. Contrary to many people's expectations that counts as making a mark on the page so the entire page would then be considered as the BoundingBox.
As long as you are using the Ghostscript eps2write device you could also parse the document for %%BeginPageSetup, the eps2write device still writes the original document size out in this section, Eg:
%!PS-Adobe-3.0 EPSF-3.0
%%Invocation: path/gswin32c -dDisplayFormat=198788 -dDisplayResolution=96 -sDEVICE=eps2write -sOutputFile=? ?
%%BoundingBox: 101 132 191 256
%%HiResBoundingBox: 101.80 132.80 190.30 255.20
%%Creator: GPL Ghostscript GIT PRERELEASE 951 (eps2write)
....
....
%%EndProlog
%%Page: 1 1
%%BeginPageSetup
4 0 obj
<</Type/Page/MediaBox [0 0 300 300]
/Parent 3 0 R
/Resources<</ProcSet[/PDF]
>>
/Contents 5 0 R
>>
endobj
%%EndPageSetup
You can see here that the original media size was 300x300, even though the BoundingBox correctly reflects the marks made on the page. Note! This is characteristic of EPS files produced by the current version of eps2write, it won't work for EPS files from other sources and may not work with eps2write in the future.
Other than that you're stuck with finding the media size from the input and passing it separately to the program doing the insertion, presumably by putting the data in some other text file to accompany the EPS. Or, of course, manually or programmatically editing the urx,ury co-ordinates of the BoundingBox.
Ghostscript isn't going to do this for you I'm afraid.

Related

Ghostscript converting PostScript to PDF seems to ignore the page size / BoundingBox

I have created a PostScript file from a TIFF image using ImageMagick.
The command-line I am using is:
convert input.tif[0] -density 600 -alpha Off -size 5809x9408 -depth 16 intermediate.ps
This takes my input tiff image (just the main image, and not the thumbnail via using [0]) and creates a .ps file from the bitmap.
When I look at the header of my PostScript file, I can see that it has the correct page size:
%!PS-Adobe-3.0
%%Creator: (ImageMagick)
%%Title: (intermediate.ps)
%%CreationDate: (2017-05-22T08:43:44+10:00)
%%BoundingBox: -0 -0 697 1129
%%HiResBoundingBox: 0 0 697.08 1129
%%DocumentData: Clean7Bit
%%LanguageLevel: 1
%%Orientation: Portrait
%%PageOrder: Ascend
%%Pages: 1
%%EndComments
Yet, when I use GhostScript to convert this to a PDF, unless I go to a lot of trouble to specify otherwise, gs is cropping it and putting it on a US Letter sized page.
gs -dPDFA=1 -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sDefaultRGBProfile=AdobeRGB1998.icc -dOverrideICC -sOutputFile=output.pdf -r600 -P PDFA_def.ps intermediate.ps
When I open the resulting PDF, the crop box is 612 x 792 pt wich is US Letter. It should be 697 x 1129 pt, the size of the Bounding Box in the PostScript file.
I have created a custom .joboptions file using Acrobat Distiller that sets image compression and the like, and in this file if I specify the page size at the end, then the resulting PDF comes out the correct size:
<<
/HWResolution [600 600]
/PageSize [697.080 1128.960]
>> setpagedevice
Now this isn't a huge issue for a one-off conversion, but I have to convert a large number of images and I don't want to set the page size manually for every single file.
The lines you quote above are comments and, from the comments present, suggest that this is an EPS file, not a PostScript program.
The main difference is that EPS is 'encapsulated' which means its intended to be placed verbatim inside a PostScript program. The enclosing program contains the intelligence regarding the media size, and arranges to set the context such that the EPS is scaled, rotated, translated so that it fits appropriately on the media.
In order to do this successfully, the EPS file must follow certain rules; in particular it must not set any media size itself (because that would mess with the enclosing program).
So it seems likely to me that what you have is an EPS file which does not request any media size at all. So its hardly surprising that you have to tell Ghostscript what you want to do with it.
Now in order for the enclosing program to place the EPS it needs to know its characteristics, the size and shape of the content. That's what the comments are for. Ordinarily an EPS file is read by an application (eg MS Word, LibreOffice etc) which parses out those comments and uses the information when generating the final PostScript program. The reason an EPS uses comments to store this information is precisely so that it has no effect on the actual content of the EPS and so the entire EPS can be included without further processing by the application.
The short answer is that if you read the Ghostscript documentation here you will find descriptions of the EPSCrop and EPSFitPage command line switches which will do all the work for you.

Raw pdf color conversion (with known conversion formula) from RGB to CMYK

This question is related to
Script (or some other means) to convert RGB to CMYK in PDF?
however way more specific. Consider that I am not an expert in print production ;)
Situation: For printing I am only allowed to use two colors, Cyan and Black. The printery requests the final PDF to be in DeviceCMYK with only the Channels C and K used.
pdflatex automatically does that (with the xcolor package) for all fonts and drawn objects, however I have more than 100 sketches/figures in PDF format which are embedded in the manuscript. Due to an admittedly badly designed workflow (late realization that Inkscape cannot export CMYK PDFs), all these figures were created in Inkscape, and thus are RGB PDFs.
However, the only used colors within Inkscape were RGB complements of CMY(K), e.g. 100% Cyan is (0,255,255) RGB and 50% K is (127,127,127) etc.
Problem: I need to convert all these PDF figures from RGB to DeviceCMYK (or alternatively the whole PDF of the final manuscript) with a specific conversion formula.
I did a lot of google research and tried the often suggested ways of using e.g. Ghostscript or various print production tools in Adobe Acrobat, however all of the conversion techniques I found so far wanted to use ICC color profiles or used some other conversion strategy which filled the channels MY and spared some C and K, for example.
I know the exact conversion formula for the raw color numbers from our Inkscape-RGBs to the channels C and K, however I do not know or find any program or tool that allows me to manually specify conversion formulas.
Question: Is there any workflow to convert my PDFs from RGB to C(MY)K manually with my own specific conversion formula for the raw numbers with the converted PDF being in DeviceCMYK using a tool, script or Adobe product?
Due to the large number of figures I would prefer a batched solution which doesn't require too much coding from my side, but if it should be the only solution, I'd also be open minded for a workflow like "load/convert/save" within a program for every single figure or writing a small program with an easy-to-handle C++ PDF API for example.
Limitations and additional info: A different file format (like TikZ figures) is not possible any more since it does not work perfectly and the necessary adaptions to the figures would create too much overhead. A maybe helpful information: Since the figures are created in Inkscape, there are no raster images within the PDFs. I also do not want all figures to be converted to raster images during the color conversion.
Edit:
I have created an example of a RGB PDF-figure created with inkscape.
I also did a manual object-by-object color conversion to a CMYK-PDF with Illustrator, to show how the result should look like. Illustrator stores the axial shading in a DeviceN colorspace with the colors cyan and black, which is close enough^^
Here is an idea, I think it will work if your PDF files are using exclusively the colorspaces DeviceGray, DeviceRGB and DeviceCMYK:
1- Convert all your PDF files to Postscript (with pdf2ps from ghostscript for example)
2- Write a Postscript program that redefines the operators setrgbcolor, setgray and setcolor with your own implementation in the Postscript language, your implementation will internally use setcmykcolor and it will compute the values using your custom formula.
Here is an example for redefining the setgray operator:
% The operator setcmykcolor expects 4 values in the stack
% When setgray is called, we can expect to have 1 value in the stack, we will
% use it for the black component of cmyk by adding 3 zeros and rolling the
% top 4 elements of the stack 3 times
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
3- Paste your Postcript program at the begining of each resulting ps file from step 1.
4- Convert all your files back to PDF (with ps2pdf for example)
See it in action by saving this piece of code as sample.ps:
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
0.5 setgray
0 0 moveto
600 600 lineto
stroke
showpage
Convert it to PDF with ghostscript using this command line (I used version 9.14):
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=sample.pdf sample.ps
The resulting PDF will have the following page content:
q 0.1 0 0 0.1 0 0 cm
/R7 gs
10 w
% The K operator is the PDF equivalent of setcmykcolor in postscript
0 0 0 0.5 K
0 0 m
3000 3000 l
S
Q
As you can see, the ps-> pdf conversion will preserve the cmky colors specified in postscript with the setcmykcolor operator.
Maybe you can post your formula as a new question and someone could help you out translating it to postscript.
Since you have access to Illustrator, you might want to try importing the PDF into Illustrator and using Illustrator's scripting capabilities to iterate over the elements and replace fill/stroke RGB colors with their CMYK replacement colors.
The difficulty will be with the shading patterns (Gradients) used in the PDF; if they are imported as GradientColor, then in theory it's a matter of digging into the GradientColor to find the base RGB colors and substitute their CMYK replacement.
A very similar problem was solved using the ActivePDF.dll with C++ (or C#??).

Add white border to PDF (change paper format)

I have to change a given PDF from A4 (210mm*297mm) to 216mm*303mm.
The additional 6 mm for each dimension should be set as white border of 3mm on each side. The original content of the PDF pages should be centered on the output pages.
I tried with convert:
convert in.pdf -bordercolor "#FFFFFF" -border 9 out.pdf
This gives me exactly the needed result but I loose very much sharpness of the original images in the PDF. It is all kind of blurry.
I also checked with
convert in.pdf out.pdf
which does no changes at all but also screws up the images.
So I tried Ghostcript but did not get any result. The best approach I found so far from a German side is:
gs -sOutputFile=out.pdf -sDEVICE=pdfwrite -g6120x8590 \
-c "<</Install{1 1 scale 8.5 8.5}>> setpagedevice" \
-dNOPAUSE -dBATCH in.pdf
but I get Error: /typecheck in --.postinstall--.
By default, Imagemagick converts input PDF files into images with 72dpi. This is awfully low resolution, as you experienced firsthad. The output of Imagemagick is always a raster image, so if your input PDF was text, it will no longer be.
If you don't mind the output PDF's getting bigger, you can simply increase the ratio Imagemagick is probing the original PDF using -density option, like this:
convert -density 600 in.pdf -bordercolor "#FFFFFF" -border 9 out.pdf
I used 600 because it is the sweet spot that works well for OCR. I recomment trying 300, 450, 600, 900 and 1200 and picking the best one that doesn't get unwieldably huge.
Shifting the content on the media is not especially hard, but it does mean altering the content stream of the PDF file, which most PDF manipulation packages avoid, with good reason.
The code you quote above really won't work, it leaves garbage on the operand stack, and the PLRM explicitly states that it is followed by an implicit initgraphics which will reset all the standard parameters anyway.
You could try instead setting a /BeginPage procedure to translate the origin, which will probably work:
<</BeginPage {8.5 8.5 translate} >> setpagedevice
Note that you aren't simply manipulating the original PDF file; Ghostscript takes the original PDF file, interprets it into graphics primitives, then reassembles those primitives into a new PDF file, this has implications... For example, if an image is DCT encoded (a JPEG) in the original, it will be decompressed before being passed into the output file. You probably don't want to reapply DCT encoding as this will introduce visible artefacts.
A simpler alternative, but involving multiple processing steps and therefore more potential for problems, is to first convert the PDF to PostScript with the ps2write device, specifying your media size, and also the -dCenterPages switch, then use the pdfwrite device to turn the resulting PostScript into a new PDF file.
Instead of
-g6120x8590 \
-c "<</Install{1 1 scale 8.5 8.5}>> setpagedevice"
(which is wrong), you should use:
-g6120x8590 \
-c "<</Install{8.5 8.5 translate}>> setpagedevice"
or
-g6120x8590 \
-c "<</Install{3 25.4 div 72 mul dup translate}>> setpagedevice"
(which lets Ghostscript calculate the "3mm == 8.5pt" itself...)

How to correctly crop PDF with uneven text margins

I have PDF like this:
where all margins relative to text content are different on per page basis.
Is there any tool that can correct this for me?
I know Scan Tailor can do this on bitmap, but this is PDF with just text layer, so I'm not after solution that would involve bitmaps at any stage
Update:
OK, for me there is no need to try to run PDFCrop on Windows, as main feature is provided by ghostscript. This command (taken from pdfcrop perl script):
gswin32c.exe -dSAFER -dNOPAUSE -dBATCH -q -r72 -sDEVICE=bbox -f input.pdf 2> bbox.txt
produces bbox.txt file, with text content dimensions, as if there are no margins (bounding box). It looks like this:
%%BoundingBox: 91 259 474 757
%%HiResBoundingBox: 91.000000 259.000000 474.000000 757.000000
%%BoundingBox: 85 224 470 768
%%HiResBoundingBox: 85.000000 224.000000 469.375000 768.000000
%%BoundingBox: 102 217 489 768
%%HiResBoundingBox: 102.000000 217.000000 488.457031 768.000000
...
where first to numbers are lower left corner x,y values and rest two and upper right, measuring from lower left edge (in pixels/points).
This can be read by user's language of choice and then bboxes corrected as desired and passed again to ghostscript as i.e. referenced here: Cropping a PDF using Ghostscript 9.01
If you are sure that only text is involved (and not images with text drawn on it or paths drawing symbols), you can quite easily build such a tool in Java using iText (or most likely also some .NET language using iTextSharp) using the parser package functionality.
The book iText in Action, 2nd edition, in chapter 15.3.4 shows how to find the text margins, and the sample code can be found in ShowTextMargins.java in the SourceForge iText SVN repository.
By manipulating the MediaBox entries of the individual pages you can then adapt the margins as desired.

Convert multipage PDF to PNG and back (Linux)

I have a lot of PDF documents that I want to convert to PNG, edit in Gimp, and then save back to the multipage Acrobat file. I'm filling out forms and adding scanned signature, trying to avoid printing, signing, then scanning back in, with the ability to type the information I need to enter.
I've been trying to use Imagemagick to convert to png files, which seems to work fine. I use the command convert -quality 100 -density 300x300 multipage.pdf single%d.png
(I'm not really sure if the quality parameter is right for png).
But I'm having problems with saving back to PDF. Some of the files have the wrong page size, and I've tried every command and procedure I can find, but there are always a few odd sizes. The resolution seems to vary so that it looks good at a certain zoom level, but either a few pages are specified at about 2" wide, or they are 8.5x11 but the others are about 35" wide. I've tried making sure Gimp had the canvass size and resolution correct, and to save the resolution in the file, but that doesn't seem to matter.
The command I use to save the files is convert -page letter -adjoin single*.png multipage.pdf I've tried other parameters, but none seemed to matter.
If anyone has any ideas or alternatives, I'd appreciate it.
"I'm not really sure if the quality parameter is right for PNG."
For PNG output, the -quality setting is very unlike JPEG's quality setting (which simply is an integer from 0 to 100).
For PNG it is composed by two single digits:
The first digit (tens) is (largely) the zlib compression level, and it may go from 0 to 9.
(However the setting of 0 has a special meaning: when you use it you'll get Huffman compression, not zlib compression level 0. This is often better... Weird but true.)
The second digit is the PNG data encoding filter type (before it is compressed):
0 is none,
1 is "sub",
2 is "up",
3 is "average",
4 is "Paeth", and
5 is "adaptive".
In practical terms that means:
For illustrations with solid sequences of color a "none" filter (-quality 00) is typically the most appropriate.
For photos of natural landscapes an "adaptive" filtering (-quality 05) is generally the best.
"I'm having problems with saving back to PDF. Some of the files have the wrong page size, and I've tried every command and procedure I can find [...] but either a few pages are specified at about 2" wide, or they are 8.5x11 but the others are about 35" wide."
Not having available your PNG files, I created a few simple ones with different dimensions to verify the different commands (as I wasn't sure myself any more). Indeed, the one you used:
convert -page letter -adjoin single*.png multipage.pdf
does create all PDF pages in (same) letter size, but it places my sample of (differently sized) PNGs always on the lower left corner of the PDF page. (Should a PNG exceed the PDF page size, it does scale them down to make them fit -- but it doesn't scale up smaller PNGs to fill the available page space.)
The following modification to the command will place the PNGs into the center of each PDF page:
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
multipage.pdf
If this is still not good enough for you, you can enforce a (possibly non-proportional!) scaling to almost fill the letter area by adding a -scale '590!x770!' parameter (this will leave a border of 11 pt at each edge of the page):
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
-scale '590!x770!' \
multipage.pdf
To leave away the extra border, use -scale '612!x792!'. -- Should you want only upward scaling to happen if required while keeping the aspect ratio of the PNG, use -scale '590<x770<':
convert \
-page letter \
-adjoin \
single*.png \
-gravity center \
-scale '590<x770<' \
multipage.pdf
Why not just use Xournal? That's what I use to annotate PDFs