I'm trying to create a PDF with fonts encoded as winansi instead of custom.
The source file is PCL and I use ghostpcl to convert it to PDF using pdfwrite device.
The PDF is created successfully. However, the font encoding (when checked with pdffonts) is 'custom', but I want it to be 'winansi'. How can I achieve this?
You almost certainly can't do that, you certainly have no control over what pdfwrite chooses to do with the Encoding. Without seeing the input PCL file I can't comment on why the Encoding isn't winansi, but my guess would be that there is insufficient information in the incoming PCL to determine what the font encoding is, and so the only alternative is to use a custom encoding.
If you are trying to make an editable/searchable PDF file from PCL input you cannot reliably do that.
Related
I am not a programmer, but a normal user who uses Linux.
I want to use Ghostscript to DISPLAY Pdf files, not to CREATE Pdf files. (I have never used Ghostscript until now).
But I want Ghostscript to automatically replace all fonts with other fonts when I open the PDF. No matter if the fonts are embedded or not.
With which fonts should the fonts be replaced?
Answer: I want to create a list of fonts, that I want to be available for replacement.
But which of these fonts on the list should be used?
Answer: The one that best matches the metric of the font to be replaced.
Is it possible to do this somehow?
You can't get Ghostscript to do what you are asking. If a PDF file contains fonts Ghostscript will use those fonts, it will only substitute if it cannot find an embedded font.
The reason for this is simple; the font embedded in the PDF file is the correct font. It's Metrics are correct, and the mapping form character code to the appropriate glyph selector in the font will be correct.
It's also a non-trivial problem to select from a list of fonts the one which 'best matches the metrics of the font to be replaced'. What characteristics should be considered ? How should those be determined ?
When a font is not embedded then Ghostscript will consult its own list of fonts and CIDFonts. Both of these lists can be customised, the documentation is here
But since a substitute font is always going to be a compromise, you can't tell Ghostscript not to use the embedded fonts in a PDF. Well technically you could, by modifying the PDF interpreter, but you say you aren't a programmer, so I doubt you will want to try that.
The use of xfa inside pdf isn’t only for creating forms
Short : I need valid test cases for a new xfa ᴘᴅꜰ reader, but couldn’t found anyone nor I could find how to use ghostscript in order to create such test cases in batch.
The point is I don’t know how to build the extra information ghostscript should handle without an hex editor.
Ghostscript doesn't handle XFA at all, neither on input nor in output, you cannot use Ghostscript to create XFA files.
Nor does Ghostscript (currently) create PDF files which solely consist of an image. Even if it did, these wouldn't be PNG or TIFF images, as those file formats are not directly supported by PDF. The next release of Ghostscript will contain devices which produce PDF files where the content is a rendered bitmap image created from the input. But they won't be either PNG or TIFF file format.
Note that XFA has been removed from the PDF 2.0 specification (hardly surprising as its XML not PDF format).
I am trying to convert a postscript file which contains some telugu Font (i.e Vani Bold). After converting the file into pdf I am not able to copy the text from generated pdf file .When I see the properties of pdf file in centos document viewer it is showing like below
I am using below command to convert postscript file to pdf
bin/gs -dBATCH -sDEVICE=pdfwrite -sNOPAUSE -dQUITE -sOutputFile=/home/cloudera/Desktop/PrintTest/telugu.pdf /home/cloudera/Desktop/PrintTest/VirtualPrinter_27_09_2016_19_11_41_691.ps
I tried with ghostscript 9.19 and 9.20 as well,but no change.
Following is the link to my postscript file which I am trying to convert into pdf.
click here for postscript file
I have been struggling with this since 10 days .Please provide some solution for this.
I can tell you why you can't copy & paste the text, but I'm not sure I can provide an acceptable solution.
First, not all pdf viewers can deal with unicode characters (for example,xpdf can't, it just ignores them, while mudpf and qpdfview work).
Second, to be able to convert font glyphs to unicode characters, the font object in the PDF file must contain a /ToUnicode property. If you look at the generated PDF after decompression (mutool clean -d), you can see that the Vani font in object 8 0 doesn't have it, while both the Arial font in object 10 0 and the Calibri font in object 12 0 do.
So very likely the Vani font is missing this unicode translation information, you need to either add this information (e.g. with fontforge), or choose a different font that has this information.
Related question:
https://superuser.com/questions/1124583/text-in-pdf-turns-gibberish-on-copying-but-displays-fine/1124617#1124617
I am exploring tools to convert PDF documents to PDF/A. Ghostscript seems to give out of the box support for such a conversion. One issue seems to be that some true type fonts that are a part of the original PDF document are not converted correctly. If I copy a text from the converted PDF/A document, and paste it in notepad, the copied text appears to be garbled text.
The original document text can be copied to notepad just fine.
I am using the following script:
gswin64 -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=FilteredOutput.pdf Filtered1Page.pdf
I have uploaded a sample 1 page source PDF in Google Drive:
SampleInput
A sample output PDF/A document generated from the command is in Google drive here:
SampleOutput
Running the above query on this PDF in a windows machine will reproduce the issue.
Are there any settings / commands make the PDF/A conversion to be handled properly?
Copy and paste from a PDF is not guaranteed. Subset fonts will not have a usable Encoding (such as ASCII or UTF-8), in which case they will only be amenable to cut/paste/search if they have an associated ToUnicode CMap, many PDF files do not contain ToUnicode CMaps.
Of course, the PDF/A specification states (oddly in my opinion) that you should not use subset fonts, but its not always possible to tell whether a font is subset (not all creators follow the XXXXX+ convention), and even if the font isn't subset there still isn't any guarantee that its Encoding is one that is usable.
Looking at the file you have posted, it does not contain one of the fonts it uses (Arial,Bold) and so Ghostscript substitutes with DroidSansFallback, and the font it does contain (FreeSansBold) is a subset (FWIW this font doesn't actually seem to be used....). The fallback font is a CIDFont, so there is no real prospect of the text being 'correct'.
I believe that if you make a real font available to Ghostscript to replace Arial,Bold then it will probably work correctly. This would also fix the rather more obvious problem of the spacing of the characters being incorrect (in one place, wildly incorrect), which is caused by the fallback font having different widths to the original.
NB as the warning messages have already told you don't use -dUseCIEColor.
The fact that you cannot copy/paste/search a PDF does not mean that it is not a valid PDF/A-1b file though, so thsi does not mean that the creation (NOT conversion) of the PDF/A-1b is not 'proper'.
I'm trying to write my own PostScript file manually and want to use a custom TTF font downloaded from the web but it's not using it - either uses some other font or doesn't display the text at all. I don't have problems with the fonts installed in the system.
The commands I used were different variations of:
/FontName /TheFontName def
/TheFontName 20 selectfont
(XXXXXXXXXXX) show
You can't use a TrueType font directly in PostScript, unlike PDF PostScript doesn't support TrueType.
In order to use a TrueType font you must first convert it into a type 42 font which PostScript does support.
Adobe Technical Note 5012 documents the type 42 format
You must convert ttf fonts to pfb and pfm format to use it in postscript. There are online tools available to convert ttf fonts to pfb and pfm format.