Replace fonts by converting PDF to PDF/A - pdf

I faced a problem while developing converter from pdf to pdf/a. I use PDFBox library. I need not only convert existing pdf to pdf/a, but also replace font. In input pdf are used standart Helvetica and Courier. I want to replace Helvetica with another font. Is it possible to do it without re-rendering?

Related

Create arabic text pdf file using pdfbox

I want to create a PDF in arabic using PdfBox but don't want to use any external .ttf file. Is there any other way to do this?
I have already gone through many stuffs but all are using .ttf files.
PdfBox has the following fonts: TIMES_ROMAN,HELVETICA,COURIER,SYMBOL,ZAPF_DINGBATS with their styled counter parts. if you have tried all these fonts with no results, it means that the character encoding used for them does not support the Arabic characters.
Your best bet is to attach a .ttf font that contains those symbols.
Also you might want to take a look at this Write arabic characters with PDFBOX

Ghostscript generated pdf content is not able to copy

I am trying to convert a postscript file which contains some telugu Font (i.e Vani Bold). After converting the file into pdf I am not able to copy the text from generated pdf file .When I see the properties of pdf file in centos document viewer it is showing like below
I am using below command to convert postscript file to pdf
bin/gs -dBATCH -sDEVICE=pdfwrite -sNOPAUSE -dQUITE -sOutputFile=/home/cloudera/Desktop/PrintTest/telugu.pdf /home/cloudera/Desktop/PrintTest/VirtualPrinter_27_09_2016_19_11_41_691.ps
I tried with ghostscript 9.19 and 9.20 as well,but no change.
Following is the link to my postscript file which I am trying to convert into pdf.
click here for postscript file
I have been struggling with this since 10 days .Please provide some solution for this.
I can tell you why you can't copy & paste the text, but I'm not sure I can provide an acceptable solution.
First, not all pdf viewers can deal with unicode characters (for example,xpdf can't, it just ignores them, while mudpf and qpdfview work).
Second, to be able to convert font glyphs to unicode characters, the font object in the PDF file must contain a /ToUnicode property. If you look at the generated PDF after decompression (mutool clean -d), you can see that the Vani font in object 8 0 doesn't have it, while both the Arial font in object 10 0 and the Calibri font in object 12 0 do.
So very likely the Vani font is missing this unicode translation information, you need to either add this information (e.g. with fontforge), or choose a different font that has this information.
Related question:
https://superuser.com/questions/1124583/text-in-pdf-turns-gibberish-on-copying-but-displays-fine/1124617#1124617

Pdf partial font embedding with iText

I am asked to include partial font into a pdf.
I think I will use iText and I found how to embed the font but I found no clue about partial embedding.
Does anybody know if partial embedding is automatic ? Or maybe iText does not have this feature ?
Thank you.
When does iText embed the full font, a subset, or no font?
In this answer, it is assumed that you use the BaseFont class and the Font class like this:
BaseFont bf = BaseFont.createFont(pathToFont, encoding, embedded);
Font font = new Font(bf, 12);
In this snippet:
pathToFont is the path to a font file (.ttf, .ttc, otf, .afm),
encoding is an encoding such as "winansi", BaseFont.IDENTITY_H,...
embedded is a boolean: true or false.
Will iText embed the font or not?
That's determined by the embedded parameter:
If it is false, the font isn't embedded.
If it is true, the font is embedded, except in the case of the Standard Type 1 fonts or Type 1 fonts for which the .pfb file is missing or CJK fonts.
Regarding the exceptions:
The Standard Type 1 fonts are 4 flavors of Helvetica (regular, bold, italic, bold-italic), 4 flavors of Times Roman (...), 4 flavors of Courier (...), Symbol and Zapfdingbats. iText ships with 14 Adobe Font Metrics (AFM) files. These files contain the metrics that are needed to calculate widths of glyphs and words. iText doesn't have the necessary Printer Font Binary (PFB) files that are required to embed the font.
Type 1 fonts are stored in two files: an AFM file and a PFB file. If you provide an AFM file, iText will look for the PFB file in the same directory. If iText doesn't find any PFB file, the font can't be embedded.
CJK stands for a series of Chinese, Japanese and Korean fonts that are available in downloadable font packs. It's a special type of Asian fonts; Asian fonts in .ttf, .otf or .ttc files can be embedded.
Will iText subset the font or not?
iText will always try to embed a subset of the font, not the whole font, except in case you provide a Type 1 font (AFM and PFB file). In case a Type 1 font is provided, the full font is embedded.
Can iText embed the full font?
Yes, you can force iText to embed the full font by adding this line:
bf.setSubset(false);
However, this value will be ignored in case you use the encoding Identity-H because that's how it's described in ISO-32000-1. iText will only embed full fonts that are stored inside the PDF as a simple font (256 characters); iText will never embed fonts that are stored as a composite font (up to 65,535 characters).

Which Chinese font is commonly supported by PDF readers of Chinese people?

I am generating PDF files which contain English and Chinese characters (using the Ruby Prawn library). I don't want to embed a Chinese font file in the generated PDF files, because these files need to stay small. So I'm wondering if I could just mentioning a Chinese font name in my PDF files, and have the PDF readers correctly rendering the Chinese characters because the PDF readers would already have the Chinese font file.
Is that something sensible? If so is there any commonly used Chinese font that one can expect to be installed in most of the PDF readers used by Chinese people?
The best way to ensure that a PDF file can be displayed on a any reader, is to use partially embedded fonts (also known as font subset). In PDF, you don't need to include the whole font with your document, having a subset with just the glyphs that were used in the file is enough for the file to be portable.

PdfBox - change font of text in PDF

Is it possible to change text fonts in existed PDF through PdfBox? If yes how to do that? I have problems with some special fonts in PDF and I want to change them to font that is widely supported.
Thanks