GhostScript PDF 1.5 (from tiff to PDF with ImageMagick) convert to PDF/A - pdf

I need to create a PDF/A from a Folder of Tiff Files.
Creating a PDF (1.5) is working with ImageMagick.
But Converting this PDF to a PDF/A using Ghostscript is a problem.
My GhostScript cmd:
-dPDFA=2 -dNOOUTERSAVE -sProcessColorModel=DeviceRGB -sDEVICE=pdfwrite -o "C:\Temp\TestData\TIFF to PDF Imagemagick\pdfa.pdf" "C:\Temp\TestData\TIFF to PDF Imagemagick\PDFA_def.ps" -dPDFACompatibilityPolicy=1 "C:\Temp\TestData\TIFF to PDF Imagemagick\test.pdf"
Also tryed:
-dPDFA=2 -dBATCH -dNOPAUSE -sColorConversionStrategy=RGB -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile="C:\Temp\TestData\TIFF to PDF Imagemagick\pdfa.pdf" "C:\Temp\TestData\TIFF to PDF Imagemagick\PDFA_def.ps" "C:\Temp\TestData\TIFF to PDF Imagemagick\test.pdf"
my PDFA_def.ps is the GS standard with:
/ICCProfile (AdobeRGB1998.icc) % Customise
The created PDF/? is not passing the "Verify compliance with PDF/A-2b" preflight in Adobe Acrobat:
Error
Metadata missing (XMP)
PDF/A entry missing
Syntax problem: Indirect object “endobj” keyword not preceded by an EOL marker
Syntax problem: Stream dictionary improperly formatted
Also not the https://www.pdf-online.com/osa/validate.aspx validator:
File pdfa.pdf
Compliance pdf1.5
Result Document does not conform to PDF/A.
Details
Validating file "pdfa.pdf" for conformance level pdf1.5
XML line 10:212: xmlParseCharRef: invalid xmlChar value 0.
The document does not conform to the requested standard.
The document's meta data is either missing or inconsistent or corrupt.
The document does not conform to the PDF 1.5 standard.
Done.
Also tryed VeraPDF ....
What kind of settings have I forgotten?

Well there's quite a few problems here.
You haven't said what version of Ghostscript you are using, nor have you supplied an example file to experiment with. You also haven't given the back channel output which might contain additional information.
You can't use the supplied model PFA_def.ps without modification, at the very least you need to modify the /ICCProfile entry to point to a real valid ICC profile. I suspect this has caused pdfwrite to abort PDF/A-2 production, which would normally be mentioned in the back channel output.
You haven't set -dColorConversionStrategy, just setting the ProcessColorModel is not sufficient, pdfwrite will mostly ignore that. If you don't tell pdfwrite that you want colours converted to a different space, it will preserve them unchanged, regardless of the Process color model.

With this command its now running:
-dPDFA=2 -sColorConversionStrategy=RGB -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=1 -dNOPAUSE -dBATCH -o "C:\Temp\TestData\tiff2pdfa\pdfatest.pdf" "C:\Temp\TestData\tiff2pdfa\PDFA\PDFA_def.ps" "C:\Temp\TestData\tiff2pdfa\test.pdf"
Thanks to:
Batch Convert PDF to PDF/A - MARK BERRY
But i have still some Error:
GPL Ghostscript 9.25: UTF16BE text string detected in DOCINFO cannot be represented
in XMP for PDF/A 1, discarding DOCINFO
Processing pages 1 through 56.
Page 1
GPL Ghostscript 9.25: Setting Overprint Mode to 1
not permitted in PDF/A-2, overprint mode not set
Should I be thinking about this "Overpirnt Mode"?

Related

Use Ghostscript to convert each page of a PDF to images and the output is still PDF

I know that Adobe Acrobat Reader DC can select the Microsoft Print to PDF printer to output to a PDF file with Print As Image checked in the Advanced Print Setup dialog. However, I want to use a command to do this. I tried the following command, as a result it failed to convert each page to images (Note the output file is still PDF).
gs -o 0.999.watermask.compact.screen.pdf -sDEVICE=pdfwrite -dDetectDuplicateImages=true -dPDFSETTINGS=/screen 0.999.watermask.pdf
References
7.4 PDF file output
iText 7 iText 7 for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring.
itext-rups-7.1.14.jar iText RUPS iText® 7.1.14 ©2000-2020 iText Group NV (AGPL-version)
Your -switches include -dDetectDuplicateImages=true which under the circumstances should be superfluous and the device selection can be from one of four as pointed out by KenS.
gs -o 0.999.watermask.compact.screen.pdf -sDEVICE=pdfimage32 -dPDFSETTINGS=/screen 0.999.watermask.pdf
If you want to emulate MS Print As Image PDF on Windows you would find the result in some ways inferior (and often many times bigger). But for comparison it would be,
NOTE:- "%%printer%%... is for a batch file for a command line use "%printer%...
gswin64c.exe -sDEVICE=mswinpr2 -dNoCancel -o "%%printer%%Microsoft Print to PDF" -dPDFSETTINGS=/screen -f "0.999.watermask.pdf"

Ghostscript: convert PDF to EPS with embeded font rather than outlined curve

I use the following command to convert a PDF to EPS:
gswin32 -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=epswrite -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
I then use the following command to convert the EPS to another PDF (test2.pdf) to view the EPS figure.
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I found the text in the generated test2.pdf have been converted to outline curves. There is no font embedded anymore either.
Is it possible to convert PDF to EPS without convert text to outlines? I mean, to EPS with embedded font and text.
Also after the conversion (test.pdf -> test.eps -> test2.pdf), the height and width of the PDF figure (test2.pdf) is a little bit smaller than the original PDF (test.pdf):
test.pdf:
test2.pdf:
Is it possible to keep the width and height of the figure after conversion?
Here is the test.pdf: https://dl.dropboxusercontent.com/u/45318932/test.pdf
I tried KenS's suggestion:
gswin32 -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I can see the converted test2.pdf have very weird font:
that is different from the original font in test.pdf:
When I copy the text from test2.pdf, I only get a couple of symbols like:
✕ ✖ ✗✘✙ ✚✛
Here is the test2.pdf: https://dl.dropboxusercontent.com/u/45318932/test2.pdf
I was using the latest Ghostscript 9.15. So what is the problem?
I just noticed you are using epswrite, you don't want to do that. That device is terrible and has been deprecated (and removed now). Use the eps2write device instead (you will need a relatively recent version of Ghostscript).
There's nothing you can do with epswrite except throw it away, it makes terrible EPS files. It also can't make level 2 files, no matter what you set -dLanguageLevel to
oh, and don't use -dNOCACHE, that prevents fonts being processed and decomposes everything to outlines or bitmaps.
UPDATE
You set subset fonts to true. By doing so the character codes which are used are more or less random. The first glyph in the document (say for example the 'H' in 'Hello World') gets the code 1, the second one (eg 'e') gets the code 2 and so on.
If you have a ToUnicode CMap, then Acrobat and other readers can convert these character codes to Unicode code points, without that the readers have to fall back on heuristics, the final one being 'treat it as ASCII'. Because the encoding arrangement isn't ASCII, then you get gibberish. MS Windows' PostScript output can contain additional ToUnicode information, but that's not something we try to mimic in ps2write. After all, presumably you had a PDF file already....
Every time you do a conversion you run the risk of this kind of degradation, you should really try and minimise this in your workflow.
The problem is even worse in this case, the input PDF file has a TrueType CID Font. Basic language level 2 PostScript can't handle CIDFonts (IIRC this was introduced in version 2015). Since eps2write only emits basic level 2 it cannot write the font as a CIDFont. So instead it captures the glyph outlines and stores them in a type 3 font.
However, our EPS/PS output doesn't attempt to embed ToUnicode information in the PostScript (its non-standard, very few applications can make use of it and it therefore makes the files larger for little benefit). In addition CIDFonts use multiple (2 or more) bytes for the character code, so there's no way to encode the type 3 fonts as ASCII.
Fundamentally you cannot use Ghostscript to go PDF->PS->PDF and still be able to copy/paste/search text, if the input contains CIDFonts.
By the way, there's no point in setting -dLanguageLevel at all. eps2write only creates level 2 output.
I used Inkscape To convert a .pdf to .EPS. Just upload the .pdf file to Inkscape, in the options to open chose high mesh, and save as . an EPS file.

Cropped PCL after gswin PDF to PCL conversion

I have a PDF, which I want to convert to PCL
I convert PDF to PCL using the following command:
(gs 8.70)
gswin32c.exe -q -dNOPAUSE -dBATCH \
-sDEVICE=ljetplus -dDuplex=false -dTumble=false \
-sPAPERSIZE=a4 -sOutputFile="d:\doc1.pcl" \
-f"d:\doc1.pdf" -c -quit
When I view or print the output PCL, it is cropped. I would expect the output to start right at the edge of the paper (at least in the viewer).
Is there any to way get the whole output without moving the contents of the page away from the paper edge?
I tried the -dPDFFitpage option which works, but results in a scaled output.
You are using -sPAPERSIZE=a4. This causes the PCL to render for A4 sized media.
Very likely, your input PDF is made for a non-A4 size. That leaves you with 3 options:
...you either use that exact page size for the PCL too (which your printer possibly cannot handle),
...or you have to add -dPDFFitPage (as you tried, but didn't like),
...or you skip the -sPAPERSIZE=... parameter altogether (which most likely will automatically use the same as size as the PDF, and which your printer possibly cannot handle...)
Update 1:
In case ljetplus is not a hard requirement for your requested PCL format variant, you could try this:
gs -sDEVICE=pxlmono -o pxlmono.pcl a4-fo.pdf
gs -sDEVICE=pxlcolor -o pxlcolor.pcl a4-fo.pdf
Update 2:
I can confirm now that even the most recent version of Ghostscript (v9.06) cannot handle non-Letter page sizes for ljetplus output.
I'd regard this as a bug... but it could well be that it won't be fixed, even if reported at the GS bug tracker. However, the least that can be expected is that it will get documented as a known limitation for ljetplus output...

Handling (remapping) missing/problematic (CID/CJK) fonts in PDF with ghostscript?

In brief, I'm dealing with a problematic PDF, which:
Cannot be fully rendered in a document viewer like evince, because of missing font information;
However - ghostscript can fully render the same PDF.
Thus -- regardless of what ghostscript uses to fill in the blanks (maybe fallback glyphs, or a different method to accessing fonts) -- I'd like to be able to use ghostscript to produce ("distill") an output PDF, where pretty much nothing will be changed, except font information added, so evince can render the same document in the same manner as ghostscript can.
My question is thus - is this possible to do at all; and if so, what would be command line be to achieve something like that?
Many thanks in advance for any answers,
Cheers!
Details:
I'm actually on an older Ubuntu 10.04, and I might be experiencing - not a bug - but an installation problem with evince (lack of poppler-data package), as noted in Bug #386008 “Some fonts fail to display due to “Unknown font tag...” : Bugs : “poppler” package : Ubuntu.
However, that is exactly what I'd like to handle, so I'll use the fontspec.pdf attached to that post ("PDF triggering the bug.", // v.) to demonstrate the problem.
evince
First, I open this pdf's page 3 in evince; and evince complains:
$ evince --page-label=3 fontspec.pdf
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'F5.1'
Error (7597): No font in show
Error: Unknown font tag 'F5.1'
Error (7630): No font in show
Error: Unknown font tag 'F5.1'
Error (7660): No font in show
Error: Unknown font tag 'F5.1'
...
The rendering looks like this:
... and it is obvious that some font shapes are missing.
Adobe acroread
Just a note on how Adobe's Acrobat Reader for Linux behaves; the following command line:
$ ./Adobe/Reader9/bin/acroread /a "page=3" fontspec.pdf
... generates no output to terminal whatsoever (for more on /a switch, see Man page acroread) -- and the program has absolutely no problem displaying the fonts.
Also, while I'd like to avoid the roundtrip to postscript - however, note that acroread itself can be used to convert a PDF to postscript:
$ ./Adobe/Reader9/bin/acroread -v
9.5.1
$ ./Adobe/Reader9/bin/acroread -toPostScript \
-rotateAndCenter -choosePaperByPDFPageSize \
-start 3 -end 3 \
-level3 -transQuality 5 \
-optimizeForSpeed -saveVM \
fontspec.pdf ./
Again, the above command line will generate no output to terminal; -optimizeForSpeed -saveVM are there because apparently they deal with fonts; the last argument ./ is the output directory (output file is automatically called fontspec.ps).
Now, evince can display the previously missing fonts in the fontspec.ps output - but again complains:
$ evince fontspec.ps
GPL Ghostscript 9.02: Error: Font Renderer Plugin ( FreeType ) return code = -1
GPL Ghostscript 9.02: Error: Font Renderer Plugin ( FreeType ) return code = -1
...
... and furthermore, all text seems to be flattened to curves in the postscript - so now one cannot select the text in the .ps file in evince anymore (note that the .ps file cannot be opened in acroread). However, one can convert this .ps back into .pdf again:
$ pstopdf fontspec.ps # note, `pstopdf` has no output filename option;
# it will automatically choose 'fontspec.pdf',
# and overwrite previous 'fontspec.pdf' in
# the same directory
... and now the text in the output of pstopdf is selectable in evince, all fonts are there, and evince doesn't complain anymore. However, as I noted, I'd like to avoid roundtrip to postscript files altogether.
display (from imagemagick)
We can also observe the page in the same document with imagemagicks display (note that image panning from the commandline using 'display' is apparently still not available, so I've used -crop below to adjust the viewport):
$ display -density 150 -crop 740x450+280+200 fontspec.pdf[2]
**** Warning: considering '0000000000 00000 n' as a free entry.
...
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Mac OS X 10.5.4 Quartz PDFContext <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
... which generates some ghostscripish errors - and results with something like this:
... where it's obvious that the missing fonts that evince couldn't render, are now shown here, with imagemagicks display, properly.
ghostscript
Finally, we can use ghostscript as x11 viewer itself -- to observe the same page, same document:
$ gs -sDevice=x11 -g740x450 -r150x150 -dFirstPage=3 \
-c '<</PageOffset [-120 520]>> setpagedevice' \
-f fontspec.pdf
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
Processing pages 3 through 74.
Page 3
>>showpage, press <return> to continue<<
^C
... and results with this output:
In conclusion: ghostscript (and apparently by extension, imagemagick) can seemingly find the missing font (or at least some replacement for it), and render a page with that -- even if evince fails at that for the same document.
I would, therefore, simply like to export a PDF version from ghostscript, that would have only the missing fonts embedded, and no other processing; so I try this:
$ gs -dBATCH -dNOPAUSE -dSAFER \
-dEmbedAllFonts -dSubsetFonts=true -dMaxSubsetPct=99 \
-dAutoFilterMonoImages=false \
-dAutoFilterGrayImages=false \
-dAutoFilterColorImages=false \
-dDownsampleColorImages=false \
-dDownsampleGrayImages=false \
-dDownsampleMonoImages=false \
-sDEVICE=pdfwrite \
-dFirstPage=3 -dLastPage=3 \
-sOutputFile=mypg3out.pdf -f fontspec.pdf
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
Processing pages 3 through 3.
Page 3
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Mac OS X 10.5.4 Quartz PDFContext <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
... but it doesn't work - the output file mypg3out.pdf suffers from the exact same problems in evince as noted previously.
Note: While I'd like to avoid the postscript roundtrip, a good example of gs command line with from pdf to ps with font embedding is here: (#277826) pdf - How to make GhostScript PS2PDF stop subsetting fonts; but the same command line switches for .pdf to .pdf to not seem to have any effect on the problem described above.
OK point 1; you CANNOT use Ghostscript and pdfwrite to create a PDF file 'without any additional processing'.
The way that pdfwrite and Ghostscript work is to fully interpret the incoming data (PostScript, PDF, XPS, PCL, whatever), creating a series of graphics primitives, which are passed to the pdfwrite device. The pdfwrite device then reassembles these into a brand new PDF file.
So its not possible to take a PDF file as input and manipulate it, it will always create a new file.
Now, I would suggest that you upgrade your 9.02 Ghostscript to 9.05 to start with. Missing CIDFonts are much better handled in 9.05 (and will be further improved in 9.06 later this year). (The font you are missing 'Osaka Mono' is in fact a CIDFont, not a regular font)
Using the current bleeding edge Ghostscript code produces a PDF file for me which has the missing font embedded. I can't tell if this will work for you because my copy of evince renders the original file perfectly well.
Added later
Examining the original PDF file I see that the fonts there are indeed embedded (as I would expect, since they are subsets). So in fact as you say in your own answer above, the problem is not font embedding, but the use of CIDFonts.
My answer here will not help you, as pdfwrite will still produce a CIDFont in the output. Basically this is a flaw in your version or installation of evince.
The problem with 'remapping' the characters is that a font is limited to 256 glyphs, while a CIDFont has effectively no limit. So there is no way to put a CIDFont into a Font. The only way to do this would be to create multiple Fonts each of which contained a portion of the original, and then switch between them as required. Slow and klunky.
If you convert to PostScript using the ps2write device then it will do this for you, but you stand a great risk that in the process it will convert the vector glyph data into bitmaps, which will not scale well.
Fundamentally you can't really achieve what you want to do (convert 1 CIDFont into N regular Fonts) with Ghostscript, or in fact with any other tool that I know of. While its technically possible, there is no real point since all PDF consumers should be able to handle CIDFonts. If they can't then its a bug in the PDF consumer.
Right, I got a bit further on this (but not completely) - so I'll post a partial answer/comment here.
Essentially, this is not a problem about font embedding in PDF - this is a problem with font mapping.
To show that, let's analyse the mypg3out.pdf, which was extracted by gs in the OP (from the 3rd page of the fontspec.pdf document):
$ pdffonts mypg3out.pdf
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
Error: Missing language pack for 'Adobe-Japan1' mapping
CAAAAA+Osaka-Mono-Identity-H CID TrueType yes yes yes 19 0
GBWBYF+CMMI9 Type 1C yes yes yes 28 0
FDFZUN+Skia-Regular_wght13333_wdth11999 TrueType yes yes yes 16 0
ZRLTKK+Optima-Regular TrueType yes yes yes 30 0
ZFQZLD+FPLNeu-Bold Type 1C yes yes yes 8 0
DDRFOG+FPLNeu-Italic Type 1C yes yes no 22 0
HMZJAO+FPLNeu-Regular Type 1C yes yes yes 10 0
RDNKXT+FPLNeu-Regular Type 1C yes yes yes 32 0
GBWBYF+Skia-Regular_wght13333_wdth11999 TrueType yes yes no 26 0
As the output shows - all fonts are, indeed, embedded; so something else is the problem. (It would have been more difficult to observe this in the complete fontspec.pdf, as there are a ton of fonts there, and a ton of error messages.)
The crucial point (I think) here, is that there is:
only one "Error: Missing language pack for 'Adobe-Japan1' mapping" message; and
only one CID TrueType font, which is CAAAAA+Osaka-Mono-Identity-H
There seems to be an obvious relationship between the CID TrueType and the 'Adobe-Japan1' mapping error; and I got that finally clarified by CID fonts - How to use Ghostscript:
CID fonts are PostScript resources containing a large number of glyphs (e.g. glyphs for Far East languages, Chinese, Japanese and Korean). Please refer to the PostScript Language Reference, third edition, for details.
CID font resources are a different kind of PostScript resource from fonts. In particular, they cannot be used as regular fonts. CID font resources must first be combined with a CMap resource, which defines specific codes for glyphs, before it can be used as a font. This allows the reuse of a collection of glyphs with different encodings.
All good - except here we're dealing with PDF fonts, not PostScript fonts as such; let's demonstrate that a bit.
For instance, 5.3. Using Ghostscript To Preview Fonts - Making Fonts Available To Ghostscript - Font HowTo describes how the Ghostscript-installed script called prfont.ps can be used to render a table of fonts.
However, here it would be easier with just Listing Ghostscript Fonts [gs-devel], and using resourcestatus operator to query for a specific font - which doesn't require a special .ps script:
$ gs -o /dev/null -dNODISPLAY -f mypg3out.pdf \
-c 'currentpagedevice (*) {=} 100 string /Font resourceforall'
...
Processing pages 1 through 1.
Page 1
URWAntiquaT-RegularCondensed
Palatino-Italic
Hershey-Gothic-Italian
...
$ gs -o /dev/null -dNODISPLAY -f mypg3out.pdf \
-c '/TimesNewRoman findfont pop [/TimesNewRoman /Font resourcestatus]'
....
Processing pages 1 through 1.
Page 1
Can't find (or can't open) font file /usr/share/ghostscript/9.02/Resource/Font/TimesNewRomanPSMT.
Can't find (or can't open) font file TimesNewRomanPSMT.
Can't find (or can't open) font file /usr/share/ghostscript/9.02/Resource/Font/TimesNewRomanPSMT.
Can't find (or can't open) font file TimesNewRomanPSMT.
Querying operating system for font files...
Loading TimesNewRomanPSMT font from /usr/share/fonts/truetype/msttcorefonts/times.ttf... 2549340 1142090 3496416 1237949 1 done.
We got a list of fonts; however, those are system fonts available to ghostscript - not the fonts embedded in the PDF!
(Basically,
gs -o /dev/null -dNODISPLAY -f mypg3out.pdf -c 'currentpagedevice (*) {=} 100 string /Font resourceforall' | grep -i osaka will return nothing, and
-c '/CAAAAA+Osaka-Mono-Identity-H findfont pop [/CAAAAA+Osaka-Mono-Identity-H /Font resourcestatus]' will conclude with "Didn't find this font on the system! Substituting font Courier for CAAAAA+Osaka-Mono-Identity-H.")
To list the fonts in the PDF, the pdf_info.ps script file from Ghostscript (not installed, in sources) can be used:
$ wget "http://git.ghostscript.com/?p=ghostpdl.git;a=blob_plain;f=gs/toolbin/pdf_info.ps" -O pdf_info.ps
$ gs -dNODISPLAY -q -sFile=mypg3out.pdf -dDumpFontsNeeded pdf_info.ps
...
No system fonts are needed.
$ gs -dNODISPLAY -q -sFile=mypg3out.pdf -dDumpFontsUsed -dShowEmbeddedFonts pdf_info.ps
...
Font or CIDFont resources used:
CAAAAA+Osaka-Mono
DDRFOG+FPLNeu-Italic
FDFZUN+Skia-Regular_wght13333_wdth11999
GBWBYF+CMMI9
GBWBYF+Skia-Regular_wght13333_wdth11999
GTIIKZ+Osaka-Mono
HMZJAO+FPLNeu-Regular
RDNKXT+FPLNeu-Regular
ZFQZLD+FPLNeu-Bold
ZRLTKK+Optima-Regular
So finally we can observe the CAAAAA+Osaka-Mono in Ghostscript - although I wouldn't know how to query more specific information about it from within ghostscript.
In the end, I guess my question boils down to: how could ghostscript be used, to map the glyphs from a CID embedded font - into a font with a different "encoding" (or "character map"?), which will not require additional language files?
Addendum
I have also experimented with these approaches:
pdffonts on the output here will not have the Osaka-Mono listed, but it will still complain "Error: Missing language pack for 'Adobe-Japan1' mapping": $ wget http://whalepdfviewer.googlecode.com/svn/trunk/cmaps/japanese/Adobe-Japan1-UCS2
$ gs -sDEVICE=pdfwrite -o mypg3o2.pdf -dBATCH -f mypg3out.pdf Adobe-Japan1-UCS2
same as previously - this (via Ghostscript's "Use.htm") also makes Osaka-Mono disappear from pdffonts list: gs -sDEVICE=pdfwrite -o mypg3o2.pdf -dBATCH \
-c '/CIDSystemInfo << /Registry (Adobe) /Ordering (Unicode) /Supplement 1 >>' \
-f mypg3out.pdf
this crashes with Error: /undefinedresource in findresource: gs -sDEVICE=pdfwrite -o mypg3o2.pdf -dBATCH \
-c '/Osaka-Mono-Identity-H /H /CMap findresource [/Osaka-Mono-Identity /CIDFont findresource] == ' \
-f mypg3out.pdf
Note finally that some of the .ps scripts ghostscript installs, it may use automatically; for instance, you can find gs_ttf.ps:
$ locate gs_ttf.ps
/usr/share/ghostscript/9.02/Resource/Init/gs_ttf.ps
... and then using sudo nano locate gs_ttf.ps, you can add the statement (Hello from gs_ttf.ps\n) print at the beginning of the code; then whenever one of the above gs commands is called, the printout will be visible in stdout.
References
Adding your own fonts - Fonts and font facilities supplied with Ghostscript
About "CIDFnmap" of Ghostscript - Features to support CJK CID-keyed in Ghostscript
Bug 689538 – GhostScript can not handle an embedded TrueType CID-Font
Bug 692589 – "Error CIDSystemInfo and CMap dict not compatible" when converting merged file to PDF/A - #1522
Adobe Forums: CMap resources versus PDF mapping resources: Please keep in mind that a CMap resource unidirectionally maps character codes to CIDs. Those other resources that Acrobat uses are best referred to as PDF mapping resources. Among them, there is a special category called ToUnicode mapping resources that unidirectionally map CIDs to UTF-16BE character codes
Adobe CIDs and glyphs in CJK TrueType font
Ghostscript and Japanese TrueType font
Installation guide: GS and CID font
Debian -- Filelist of package poppler-data/sid/all

Pdf text box markups missing in the converted tiff file (GhostScript)

I am trying to convert pdf to tiff. You can view the pdf in the below link:
Original pdf
http://bugs.ghostscript.com/attachment.cgi?id=7736
I currently having Ghostscript 9.02 installed in my system.
I am using the below command to convert the pdf files to tiff.
gswin32 -dSAFER -dNOPAUSE -dBATCH -q -sPAPERSIZE=a4 -r300 -sDEVICE=tiffg4
-dPDFFitPage -dGraphicalAlphaBits=1 -dTextAlphaBits=1
-sOutputFile="d:/temp/test/ConvertedPage%06d.tiff"
"d:/temp/test/TextBoxMarkupfile.pdf"
There are 3 marked up text box available in the second page. After conversion
those text values are missing in the tiff file.
Is there any options available in the ghostscript to include those text values
in the converted image?
If any workarounds available please suggest me.
Thanks,
Rajesh
A similar question+answer recommended pdftk.
I used a two-step process of [pdf] >> pdftk >> ImageMagick:
pdftk original_form.pdf output flattened_form.pdf flatten
convert -define quantum:polarity=min-is-white -endian MSB
-units PixelsPerInch -density 204x196 -monochrome
-compress Fax -sample 1728
"flattened_form.pdf" "final.tif"
Since ImageMagick uses GhostScript for pdf conversion it should be possible to use GS directly, but I've gotten better quality from ImageMagick; perhaps I'm not just using the right GS settings.