How to use font in other PDF files? (itext7 PDF) - pdf

I am now trying to modify a PDF file with ONLY text content. When I use
TextRenderInfo.getFont()
it returns me a Font which is actually an indirect object.
pdf.inderect.object.belong.to.other.pdf.document.Copy.object.to.current.pdf.document
would be thrown in this case when close the PdfDocument.
Is there a way to let me reuse this Font in a new PDF file? OR, is there a way to in-place edit the text content in PDF (without changing the font, color, fontSize)?
I'm using itext7.
Thanks

First of all, from the error message I see that you are not using the latest version of iText, which is 7.0.2 at the moment. So I recommend that you update your iText version.
Secondly, it is indeed possible to use a font in another document. But to do that, you first have to copy the corresponding font object to that other document (as stated in the exception message by the way). But you should be warned that this approach has some limitations, e.g. in case of a font subset, you will only be able to use the glyphs that are present in the original font subset in the source document and will not be able to use other glyphs.
PdfFont font = textRenderInfo.getFont(); // font from source document
PdfDocument newPdfDoc = ... // new PdfDocument you want to write some text to
// copy the font dictionary to the new document
PdfDictionary fontCopy = font.getPdfObject().copyTo(newPdfDoc);
// create a PdfFont instance corresponding to the font in the new document
PdfFont newFont = PdfFontFactory.createFont(fontCopy);
// Use newFont in newPdfDoc, e.g.:
Document doc = new Document(newPdfDoc);
doc.add(new Paragraph("Hello").setFont(newFont));

Related

Custom Property for PDF File

I would like to add total page count as a custom property like ms word does as follow.
Is it possible to similar things for pdf? I am also using aspose for file conversion. I converter many kind of file types to pdf but if we want to show also the document's pagen count as the custom property.
This is from the Aspose Newsgroup:
//open document
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document("Test.pdf");
//set properties
pdfDocument.Metadata["xmp:Nickname"] = "Nickname";
pdfDocument.Metadata["xmp:CreatorTool"] = "Test Value";
pdfDocument.Metadata["xmp:CustomProperty"] = "Custom Value";
//Convert to PDF/A compliant document
pdfDocument.Validate("log.xml", PdfFormat.PDF_A_1A);
pdfDocument.Convert("log.xml", PdfFormat.PDF_A_1A, ConvertErrorAction.Delete);
//save output document
pdfDocument.Save("output_XMP.pdf");
Is this what you are looking for?

Why is flying saucer always printing PDF on A4 paper?

I'm trying to save an html document to PDF using flyingsaucer but the generated document always ends up having an A4 dimension when I look at the Document Properties from Adobe Reader (Page Size: 8.26 x 11.69 in).
I did read the documentation and I'm passing the css #page {size: letter;} style. And while it does have an effect on the output, the page size always remains 8.26 x 11.69 in Adobe Reader. For example, if I set the page size to legal, my PDF is still the size of a A4 but the top of the document is missing as if it had fell off the "paper".
I'm not sure if the problem falls on the itext side or the flying saucer side. I was using a fairly old version so my first step was to upgrade to the latest 9.1.6 version of flying saucer. I also moved from itext 2.0.8 to openPDF 1.0.1 but I'm still getting the same behavior.
I also traced in the debugger up to the com.lowagie.text.Document creation in ITextRenderer and at this point the document size passed is correct. That makes me think that the issue might be in openPDF / iText but I can't find what I'm doing wrong.
It turns out the PDF generation was correctly using the #page size declaration and the problem was occurring later in our software. What I had not noticed is that after the generation of the PDF another method was called to merge multiple PDFs into one. This method should probably not have been called, but that's another story.
The bottom line is this method created a new com.lowagie.text.Document(), which by default creates an A4 sized document, and then was iterating over all pages of the pdf, adding the pages to the new document using pdfWriter.getImportedPage(pdfReader, currentPage++). These imported pages did not retain their original size.
I fixed it by passing the page size of the fist page when creating the merged document object:
document = new Document(pdfReader.getPageSize(1));
The real problem is that you're (unwittingly) using software that is no longer supported. Anything that still has the namespace lowagie (the founder and CTO of iText) is really outdated.
If you simply want to convert HTML to pdf, why not use iText directly and cut out the middle-man?
We have multiple options for you.
XMLWorker (iText5 based code that converts HTML to pdf)
pdfHTML (iText7 based add-on that converts HTML5/CSS3 to pdf)
This is a rather extensive code-sample for using pdfHTML:
public void createPdf(String src, String dest, String resources) throws IOException {
try {
FileOutputStream outputStream = new FileOutputStream(dest);
WriterProperties writerProperties = new WriterProperties();
//Add metadata
writerProperties.addXmpMetadata();
PdfWriter pdfWriter = new PdfWriter(outputStream, writerProperties);
PdfDocument pdfDoc = new PdfDocument(pdfWriter);
pdfDoc.getCatalog().setLang(new PdfString("en-US"));
//Set the document to be tagged
pdfDoc.setTagged();
pdfDoc.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
//Set meta tags
PdfDocumentInfo pdfMetaData = pdfDoc.getDocumentInfo();
pdfMetaData.setAuthor("Joris Schellekens");
pdfMetaData.addCreationDate();
pdfMetaData.getProducer();
pdfMetaData.setCreator("iText Software");
pdfMetaData.setKeywords("example, accessibility");
pdfMetaData.setSubject("PDF accessibility");
//Title is derived from html
// pdf conversion
ConverterProperties props = new ConverterProperties();
FontProvider fp = new FontProvider();
fp.addStandardPdfFonts();
fp.addDirectory(resources);//The noto-nashk font file (.ttf extension) is placed in the resources
props.setFontProvider(fp);
props.setBaseUri(resources);
//Setup custom tagworker factory for better tagging of headers
DefaultTagWorkerFactory tagWorkerFactory = new AccessibilityTagWorkerFactory();
props.setTagWorkerFactory(tagWorkerFactory);
HtmlConverter.convertToPdf(new FileInputStream(src), pdfDoc, props);
pdfDoc.close();
} catch (Exception e) {
e.printStackTrace();
}
}
You can find more information at http://itextpdf.com/itext7/pdfHTML

How to merge PDFs to a single file without multiple copies of the same font?

I create PDFs and concatenate them into a single PDF.
My resulting PDF is a lot bigger than I had expected in file size.
I realised that my output PDF has a ton of duplicate font, and it is the reason of unexpectedly big file size.
Here, my question is:
I would like to create PDFs which only embed font information, so let they use Windows System Font.
When I merge them into a single PDF, I insert actual font which PDF needs.
If possible, please let me know how to do it.
I've created the MergeAndAddFont example to explain the different options.
We'll create PDFs using this code snippet:
public void createPdf(String filename, String text, boolean embedded, boolean subset) throws DocumentException, IOException {
// step 1
Document document = new Document();
// step 2
PdfWriter.getInstance(document, new FileOutputStream(filename));
// step 3
document.open();
// step 4
BaseFont bf = BaseFont.createFont(FONT, BaseFont.WINANSI, embedded);
bf.setSubset(subset);
Font font = new Font(bf, 12);
document.add(new Paragraph(text, font));
// step 5
document.close();
}
We use this code to create 3 test files, 1, 2, 3 and we'll do this 3 times: A, B, C.
The first time, we use the parameters embedded = true and subset = true, resulting in the files testA1.pdf with text "abcdefgh" (3.71 KB), testA2.pdf with text "ijklmnopq" (3.49 KB) and testA3.pdf with text "rstuvwxyz" (3.55 KB). The font is embedded and the file size is relatively low because we only embed a subset of the font.
Now we merge these files using the following code, using the smart parameter to indicate whether we want to use PdfCopy or PdfSmartCopy:
public void mergeFiles(String[] files, String result, boolean smart) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy;
if (smart)
copy = new PdfSmartCopy(document, new FileOutputStream(result));
else
copy = new PdfCopy(document, new FileOutputStream(result));
document.open();
PdfReader[] reader = new PdfReader[3];
for (int i = 0; i < files.length; i++) {
reader[i] = new PdfReader(files[i]);
copy.addDocument(reader[i]);
}
document.close();
for (int i = 0; i < reader.length; i++) {
reader[i].close();
}
}
When we merge the document, be it with PdfCopy or PdfSmartCopy, the different subsets of the same font will be copied as separate objects in the resulting PDF testA_merged1.pdf / testA_merged2.pdf (both 9.75 KB).
This is the problem you are experiencing: PdfSmartCopy can detect and reuse identical objects, but the different subsets of the same font aren't identical and iText can't merge different subsets of the same font into one font.
The second time, we use the parameters embedded = true and subset = false, resulting in the files testB1.pdf (21.38 KB), testB2.pdf (21.38 KB) and testA3.pdf (21.38 KB). The font is fully embedded and the file size of a single file is a lot bigger than before because the full font is embedded.
If we merge the files using PdfCopy, the font will be present in the merged document redundantly, resulting in the bloated file testB_merged1.pdf (63.16 KB). This is definitely not what you want!
However, if we use PdfSmartCopy, iText detects an identical font stream and reuses it, resulting in testB_merged2.pdf (21.95 KB) which is much smaller than we had with PdfCopy. It's still bigger than the document with the subsetted fonts, but if you're concatenating a huge amount of files, the result will be better if you embed the complete font.
The third time, we use the parameters embedded = false and subset = false, resulting in the files testC1.pdf (2.04 KB), testC2.pdf (2.04 KB) and testC3.pdf (2.04 KB). The font isn't embedded, resulting in an excellent file size, but if you compare with one of the previous results, you'll see that the font looks completely different.
We merge the files using PdfSmartCopy, resulting in testC_merged1.pdf (2.6 KB). Again, we have an excellent file size, but again we have the problem that the font isn't visualized correctly.
To fix this, we need to embed the font:
private void embedFont(String merged, String fontfile, String result) throws IOException, DocumentException {
// the font file
RandomAccessFile raf = new RandomAccessFile(fontfile, "r");
byte fontbytes[] = new byte[(int)raf.length()];
raf.readFully(fontbytes);
raf.close();
// create a new stream for the font file
PdfStream stream = new PdfStream(fontbytes);
stream.flateCompress();
stream.put(PdfName.LENGTH1, new PdfNumber(fontbytes.length));
// create a reader object
PdfReader reader = new PdfReader(merged);
int n = reader.getXrefSize();
PdfObject object;
PdfDictionary font;
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(result));
PdfName fontname = new PdfName(BaseFont.createFont(fontfile, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED).getPostscriptFontName());
for (int i = 0; i < n; i++) {
object = reader.getPdfObject(i);
if (object == null || !object.isDictionary())
continue;
font = (PdfDictionary)object;
if (PdfName.FONTDESCRIPTOR.equals(font.get(PdfName.TYPE))
&& fontname.equals(font.get(PdfName.FONTNAME))) {
PdfIndirectObject objref = stamper.getWriter().addToBody(stream);
font.put(PdfName.FONTFILE2, objref.getIndirectReference());
}
}
stamper.close();
reader.close();
}
Now, we have the file testC_merged2.pdf (22.03 KB) and that's actually the answer to your question. As you can see, the second option is better than this third option.
Caveats: This example uses the Gravitas One font as a simple font. As soon as you use the font as a composite font (you tell iText to use it as a composite font by choosing the encoding IDENTITY-H or IDENTITY-V), you can no longer choose whether or not to embed the font, whether or not to subset the font. As defined in ISO-32000-1, iText will always embed composite fonts and will always subset them.
This means that you can't use the above solutions when you need special fonts (Chinese, Japanese, Korean). In that case, you shouldn't embed the fonts, but use so-called CJK fonts. They CJK fonts will use font packs that can be downloaded by Adobe Reader.

iTextSharp - PDF field contents become invisible

I have a PDF where I add some TextFields.
var txtFld = new TextField(stamper.Writer, new Rectangle(cRightX - cWidthX, cTopY3, cRightX, cTopY), FieldNameProtocol) { Font = bf, FontSize = cHeaderFontSize, Alignment = Element.ALIGN_RIGHT, Options = PdfFormField.FF_MULTILINE };
stamper.AddAnnotation(txtFld.GetTextField(), 1);
The ‘bf’ above is a Unicode font that gets embedded in the PDF:
BaseFont bf = BaseFont.CreateFont(UnicodeFontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED); // Create a Unicode font to write in Greek...
Later-on I fill those fields with greek text.
var acrof = stamper.AcroFields;
acrof.SetField(fieldName, field.Value/*, field.Value*/); // Set the text of the form field.
acrof.SetFieldProperty(fieldName, "setfflags", PdfFormField.FF_READ_ONLY, null); // Make it readonly.
When I view the PDF, most of the times the text is missing and if I click on the (invisible) TextField in Acrobat, then the text becomes visible (until it loses focus again).
Any idea what is going on here?
I have also tried using non-embedded font, but I get the same thing (although I still seem to get embedded fonts in PDF that are similar to the font I use). I don't know if I am missing sth.
It seemed that I was doing the following at the wrong order (the following is the correct):
acrof.SetFieldProperty(field.Name, "setfflags", PdfFormField.FF_READ_ONLY, null); // Make it readonly.
acrof.SetFieldProperty(field.Name, "textfont", bf, null);
acrof.SetField(field.Name, field.Value/*, field.Value*/); // Set the text of the form field.
At least that's hat I think it was wrong.
I have made many changes.

How do I figure out the font family and the font size of the words in a pdf document?

How do I figure out the font family and the font size of the words in a pdf document? We are actually trying to generate a pdf document programmatically using iText, but we are not sure how to find out the font family and the font size of the original document which needs to be generated. document properties doesn't seem to contain this information
Fonts are stored in the catalog (I suppose in a sub-catalog of type font). If you open a pdf as a text file, you should be able to find catalog entries (they begin and end with "<<" and ">>" respectively.
On a simple pdf file, i found the following:
<</Type/Font/BaseFont/Helvetica-Bold/Subtype/Type1/Encoding/WinAnsiEncoding>>
thus searching for the prefix should help you (in some pdf files, there are spaces between
the commponents but '/Type /Font' should be ok).
Of course this is a manual process, while you would probably prefer an automatic one.
On another note, we sometime use identifont or what the font to find uncommon fonts that give us problem (logo font).
regards
Guillaume
Edit : the following code will find all font in the pages. To be short, you search the dictionnary of each page for the subdictionnary "ressource" and then the subdictionnary "font". Each entry in the later is a font dictionnary, describing a font.
PdfReader reader = new PdfReader(
new FileInputStream(new File("file.pdf")));
int nbmax = reader.getNumberOfPages();
System.out.println("nb pages " + nbmax);
for (int i = 1; i <= nbmax; i++) {
System.out.println("----------------------------------------");
System.out.println("Page " + i);
PdfDictionary dico = reader.getPageN(i);
PdfDictionary ressource = dico.getAsDict(PdfName.RESOURCES);
PdfDictionary font = ressource.getAsDict(PdfName.FONT);
// we got the page fonts
Set keys = font.getKeys();
Iterator it = keys.iterator();
while (it.hasNext()) {
PdfName name = (PdfName) it.next();
PdfDictionary fontdict = font.getAsDict(name);
PdfObject typeFont = fontdict.getDirectObject(PdfName.SUBTYPE);
PdfObject baseFont = fontdict.getDirectObject(PdfName.BASEFONT);
System.out.println(baseFont.toString());
}
}
The name (variable "name" in the following code) is what is used in the text to change font. In the PDF, you'll have to find it next to a text. The following number is the size. Here for example, it's size 12. (sorry, still no code for this part).
BT
/F13 12 Tf
288 720 Td
the text to find Tj
ET
Depending on the PDF, if it hasn't been outlined you may be able to open it in Adobe Illustrator, double click the text and select some of it to see it's font family, size, etc.
If the text is outlined then use one of those online tools that PATRY suggests to find out the font.
Good luck
If you have Adobe Acrobat you can see the fonts inside and examine the objects and text streams. I wrote a blog post on this at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects