Convert PDF to PS in Ghostscript and preserve CMYK split - pdf

I have an RGB PDF which I've preflighted in Adobe Acrobat pro to a PDF x1a compliant PDF in US Web Coated SWOP v2.
The PDF now has 4 plates (C/M/Y/K)
C plate is empty
M plate has 100% of a red image
Y plate has 100% of the same red image
K plate has 100% of black text on page (text is not on any other plate)
I'm now trying to convert that PDF into a PS using ghostscript
I've tried:
gs -dNOPAUSE -dBATCH -sDEVICE=ps2write -sProcessColorModel=DeviceCMYK -sOutputFile=output.ps input.pdf
But then when I distill this PS back to a PDF the text is on all the plates and not just the K plate.
I've used this online tool:
http://pdf.my-addr.com/free-online-pdf-to-ps-convert.php
To also do the conversion and the distilled version of the PS generated by that preserves the plate breakdown. They are also using Ghostscript to create the PS.
So I'm assuming there is some setting I am missing.
Does anyone know?
Update 1
Trying in pdftops too and again it is taking my K plate and spreading it across all CMYK plates.
What secret magic are they doing on that web site to preserve plates?!
Update 2
Only main difference I can see is I'm using
%%Creator: GPL Ghostscript 905 (pswrite)
and that website is using
%%Creator: GPL Ghostscript 871 (pswrite)
Could it be a version thing, or are they doing something I'm not?

Ghostscript 9 and above use much better colour management than previous versions, but you have to get the ICC profiles correct. I'd guess you are using the default profiles and I think the first thing I'd suggest is that you use the current version of Ghostscript which is 9.07, I think there were a few changes made to the default profiles.
Its also possible that the PDF file now has an input RGB profile associated with it, which Ghostscript is now using whereas previously it didn't. I'd need to see the file to be able to tell better what is going on, but I have a sneaky suspicion that your 'pre-flight' conversion is causing the problem. What happens if you use the original PDF file ?
I very much doubt if the PDF file actually contains CMYK colour components, I would imagine all that has happened is that different profiles have been inserted into the file that control the conversion from RGB to CIE and from CIE to CMYK.
In passing, don't use pswrite. Its a terrible low-level output that converts much of the content into images. It produces large PostScript that processes very slowly and doesn't scale well (ie if the printer is a different resolution). Use the ps2write device instead.
By the way, since you've already used Acrobat, why don't you just use 'Save As' PostScript from there ?

Related

Scaling PDF file using ghostscript

Our system takes 8.5 x 11 PDF files (only) and does things to them. Sometimes customers hand us files to manipulate into the right shape. We're working to automate scaling non-standard sized PDF files into 8.5 x 11.
We've been able to handle most files we've tested with ghostscript, but we have this one customer submitted file that we are unable to handle. (And unfortunately we can't recreate the condition and, of course, can't post the customer's data.)
The file is PDF v1.7 and contains seven 8.5x11 pages followed by four pages that are 25.5 x 45.33 inches. I don't know how they were generated (Adobe Acrobat 10.1.2 per pdfinfo).
We have gradually added a series of parameters to our gs command until we arrived at this:
gs -sDEVICE=pdfwrite -sOutputFile=$final_file -dBATCH -dNOPAUSE -sPAPERSIZE=letter -q -r720 -g6120x7920 -dPDFFitPage -dFIXEDMEDIA $files_to_convert
This seems to work fine for our other files, but for this ONE file, the 25.5 x 45.33 pages are not scaled to letter size. Here are the measurements for the output file's pages 7 and 8's per pdfinfo:
Page 7 size: 612 x 792 pts (letter)
Page 7 rot: 0
Page 8 size: 1836 x 3264 pts
Page 8 rot: 0
I've read that PostScript has Policies, PageSize options, but I'm not aware of such a thing with PDF. And if it exists, I don't know how to alter it using ghostscript.
How can I make sure all pages are scaled to letter?
Well, Ghostscript uses PostScript as its scripting language, so anything you can do in PostScript you can do to a PDF file.
I really wouldn't use -g with pdfwrite, because -g specifies pixels, and since pdfwrite is a vector device that doesn't really work well. Use DEVICEHEIGHTPOINTS and DEVICEWIDTHPOINTS instead.
Don't set -sPAPERSIZE either, you can't set the media to be letter in one place and something different (the -g switch) elsewhere.
Its not really possible to tell you what's going on exactly with your PDF file without seeing it, and you haven't really explained what's wrong. You imply that the pages are not being scaled, but you don't say what size they are being drawn at. You also don't say why you think the pages are 'legal' size when viewed in Acrobat.
If you are saying that the pages in question are 'legal' but the media is much larger, then that is entirely possible and would suggest that the pages have a CropBox. Ghostscript uses the MediaBox for page sizes, Acrobat uses a plethora of different boxes, but usually defaults to the CropBox.
If you want Ghostscript to use the CropBox then just tell it -dUseCropBox.
Alternatively post an example somewhere and I can look at it.

ImageMagick convert produces a darker CMYK PDF than PhotoShop

The ImageMagick (IM) result of this command
convert myRGB.png -colorspace cmyk cmyk.pdf <br>
is not as bright or as close to the screen colors as a Photoshop produced CMYK PDF. myRGB.png is a PNG file produced using GIMP.
I don't own Photoshop, and would like to stick with open source tools.
The current Ubuntu release of of IM is 6.7.7. That IM version produces very dark, totally unusable, CMYK PDF.
I built 7.0.2-6 Q16 from source on Ubuntu 14.0.4, after also building LCMS package from source, and the above command works better, but the CMYK PDF as stated above is less bright and less close to the screen colors than the similar Photoshop output. E.g. blacks are not totally black; the sky color is dull blue instead of bright blue/cyan.
I've tried using ICC files downloaded from Adobe as in the following
convert myRGB.png -colorspace cmyk -profile WebCoatedSWOP2006Grade5.icc cmyk.pdf
I've tried this command with all 14 Adobe ICC files and there is no difference in any of them. Although, I admit I do not understand under what circumstances the ICC comes into play or if it is appropriate to this problem at all.
The simple question is why does IM convert tool not match the Photoshop results for CMYK?
The second question is, if IM can't be made to do it: is there any open source tool or tools that can match the Photoshop results for producing a CMYK PDF from and RGB PNG?
There are two applications involved, as you presumably know since you tagged this with Ghostscript. You haven't said which version of Ghostscript you have installed but the first thing I would do is remove ImageMagick from the equation.
Find out whether IM is having Ghostscript produce RGB or CMYK output, my bet is that it is getting RGB from GS. You'll need to find out what Ghostscript command line IM is using and I can't tell you how to do that. Assuming that the Ghostscript output is RGB then this would explain why altering the IM settings makes no difference.
Proceeding on the assumption that the above is correct, use the png16m device in Ghostscript to produce RGB PNG files directly, this reduces the scope of the problem:
gs -sDEVICE=png16m -o out.png input.pdf
Now, you don't say what version of Ghostscript you have installed, but assuming its relatively recent you can look in the /ghostpdl/doc directory and find considerable information on using colour management in Ghostscript, the document GS9_Color_Management.pdf may be helpful. It will certainly give you a myriad of opportunities to alter the output.

How to confirm a TrueType PDF font is missing glyphs

I have a PDF which renders fine in Acrobat but fails to print during the PDF to PS conversion process on our printer's RIP. After uncompressing with pdftk and editing I've found if I replace the usage of a certain font it will print.
The font is a strange one, a TrueType subset with a single character (space).
If I pass the PDF through Ghostscript it reports no errors, however an Acrobat pre-flight check will report a missing glyph for space. This error is not reported for the original file. I'm just using a basic command: gswin32c -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -o gs.pdf original_sample.pdf
I've pulled out the font data from the original PDF and saved it. Running TTFDUMP.exe produces an interesting result where it seems that the 'glyf' table is missing:
4. 'glyf' - chksm = 0x00000000, off = 0x00000979, len = 0
5. 'head' - chksm = 0xE463EA67, off = 0x00000979, len = 54
Just wondering, am I interpreting this result correctly? Is it valid to run TTFDUMP like this on extracted data from a PDF? I think a 'glyf' table is required based on the spec, at least for the first 4 necessary characters.
TTFDUMP run on the ghostscript PDF produces a similar result but with a 1-byte 'glyf' table.
If so it seems that Acrobat doesn't particularly care about the missing space while other programs (including the printer) do. It's odd it isn't reported as missing though until it runs through Ghostscript.
The PDF is created by Adobe InDesign and the font is copyrighted like most so I can't share it.
Edit - I've accepted Ken's answer as he helped me on the Ghostscript bug tracker. In summary, it seems the font is broken as suspected due to the missing glyf table. Until I hear otherwise I'll have to suppose this is a bug in InDesign, and will continue investigating.
Yes you can run ttfdump on an embedded subset font, its still a perfectly valid font.
A missing glyph is not specifically a problem, because the .notdef glyph is used instead, a missing .notdef means a font isn't legal.
I think you are mistaken about the legality of sharing the PDF file (from the point of view of font embedding). Practically every PDF file you see will contain copyright fonts, but these are permitted to be embedded and distributed as part of a PDF (or indeed PostScript) file. TrueType fonts contain flags which control the DRM of the font, and which can deny embedding in in PDF (or other formats). Ghostscript honours these embedding flags in the font as does Acrobat Distiller and other Adobe products.
There were some fonts which inadvertently shipped with DRM which prevented embedding, and there's a list somewhere of these, along with an explicit statement from the font foundry that its permissible to embed these fonts. I think this was somewhere on the Adobe web site a few years back.
So if you have a PDF file with the font embedded in it (especially if it was produced by an Adobe application) then I would be comfortable that its legal to share.
I'm having some trouble figuring out what the problem actually is, and how you are using Ghostscript. If you are running the PDF->PS and then back to PDF then all bets are off frankly. Round-tripping files will often provoke problems.
In any event I'm happy to look at the file but you will have to make it available.

Prevent Ghostscript from rasterizing text?

I'm trying to convert PDFs to PCL (using ghostscript, but I'd love to hear alternative suggestions), and every driver (ghostscript device), including all of the built-ins and gutenprint generate PCL files many times larger than the input PDF. (This is the problem - I need my PCL to be about as small as the input).
Given that the text doesn't show up in the PCL file, I guess that Ghostscript is rasterizing the text. Is there a way to prevent GS generally, or just gutenprint, from doing that? I'd rather either have it embed the fonts, or not even embed the fonts (leave it to the printer to render the fonts)?
Unfortunately, there doesn't seem to be any documentation on this point.
There are 3 (I think) types of font in PCL. There are rendered bitmaps, TrueType fonts (in later versions) and the HPGL stick font.
PDF and PostScript Have type 1, 2 (CFF), 3 and 42 (TrueType, but not the same as PCL) and CIDFonts based on any of the preceding types.
The only font type the two have in common is TrueType, so in order to retain text, any font which was not TrueType would have top be converted into TrueType. This is not a simple task. So Ghostscript simply renders the text, which is guaranteed to work.
PDF is, in general, a much richer format than PCL< there are many PDF constructs (fonts, shading, stroke/fill in a single operation, transparency) which cannot be represented in PCL. So its entirely possible that the increase in size is nothing to do with text and fonts.
In fact, I believe that the PXL drivers in Ghostscript simply render the entire page to a bitmap at the required resolution, and then wrap that up with enough PCL to be successfully sent to a printer. (I could be mistaken on this point though)
Basically, you are not going to get PCL of a similar size to your PDF out of Ghostscript.
Here is a way to 'prevent Ghostscript from rasterizing text'. But its output will be PostScript. You may however succeed convert this PostScript to a PCL5e in an additional step.
The method will convert all glyphs into outline shapes for its PostScript output, and it does not work for its PDF or PCL output. The key here is the -dNOCACHE parameter:
gs -o somepdf.ps -dNOCACHE -sDEVICE=pswrite somepdf.pdf
Of course, converting font glyphs to outlines will take more space than keeping the original fonts embedded, because "fonts" are a space-optimized concept to store, retrieve and render glyph shapes.
Once you have this PostScript, you may be able to convert it to PCL5e with the help of either of the methods you tried before for PDF input (including {Apache?} FOP).
However, I have no idea if the output will be much smaller than versions with rasterized fonts (or even wholesome rasterized pages). But it may be worth a test.
Now vote down this answer too...
Update
Apparently, from version 9.15 (to be released during September/October 2014), Ghostscript will support a new command line parameter:
-dNoOutputFonts
which will cause the output devices pdfwrite, ps2write and eps2write to "to 'flatten' glyphs into 'basic' marking operations (rather than writing fonts to the output)".
That means that the above command should be replaced by this:
gs -o somepdf.ps -dNoOutputFonts -sDEVICE=ps2write somepdf.pdf
Caveats: I've tested this with a few input files using a self-compiled Ghostscript based on current Git sources. It worked flawlessly in each case.

Need help/answers on PDF color seperation

Using the following process:
A PDF is been created by PDFCreator, when a user prints something to the virtual printer
The PDF gets further processed with integrated VBScript handler and passed over to JAVA which does some processing with the PDF content
In the middle of the process an external application is called with the PDF that adds black text and graphics to the PDF
The PDFs are collected and once a week handed over to a print shop that uses a plate for each CMYK
The problem is: the print shop needs a color seperated CMYK PDF, but the added black text & graphics from the external app should be the only content on the K plane (because we want to make a special print effect). All other content which has been printed via PDFCreator should be on CMY plates only, so black must be emulated with those colors.
At the moment we are manually braking the process before calling the external application and seperate the colors via Adobe Creator Pro, but that is no future option because the whole process should work automated.
So basicly I need a way to convert the CMYK PDFCreator PDF to a CMY version only so the external app can throw in as many black K content as needed.
Is the PDF conversion the right direction I'm heading to? Is there any way w/ ghostscript how this can be done? I read the gs documentation but got nowhere as I only saw RGB to CMYK conversion but no CMYK to CMY with empty B...
I believe that PDFCreator is simply a wrapper around ghostscript, so you may have some joy on the ghostscript mailing lists. It seems that gs does support some printers that just ouput CMY so this functionality is likely to be available in there.
Wouldn't you be better off using a new separation called Black? Can't the print shop handle that?