PDF-Forms with Unicode chars [closed] - pdf

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am currently struggling with withing a PDF form created from a LibreOffice document.
I created it like suggested in the book "iText in Action" and am now trying to pre-fill the embedded form with a few values, that can contain Unicode chars.
This includes a character that consist of base char with an addition combining char (e.G. M̂).
I have tried several different hints I found in in stackoverflow and the book, but I never got a PDF document with a form that works on all platforms: Linux (Okular, Evince, Acrobat DC, macOS Previewer, etc.)
I'm aware that I need to have a font, that covers the chars and embedded the font fully. Below there is the code I used to file the PDF document and the PDF file.
My questions are:
Is the different behavior of the PDF readers specification weakness in the PDF specification and I have to live with it?
Specially the Linux PDF readers and Acrobat behave badly. Are there known bugs?
I'm not very familiar with internals of PDF, so any suggestions? Are the contents of my PDF files ok?
Any suggestions on how to improve the code to get better results?
Code to fill the form:
BaseFont uniFont = BaseFont.createFont("./src/main/resources/UnicodeDoc.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, false, null, null, false);
uniFont.setSubset(false);
// Debugging code...
for (String codepage : uniFont.getCodePagesSupported()) {
System.out.println("Codepage = " + codepage);
}
FileInputStream fis = new FileInputStream(src);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(fis);
PdfStamper stamper = new PdfStamper(reader, baos);
// Fill all fields in PDF form
String text = "aM\u0302a"; // Same as "aM̂a"
com.itextpdf.text.pdf.AcroFields form = stamper.getAcroFields();
for (String fname : form.getFields().keySet()) {
System.out.println("form." + fname);
form.setField(fname, text);
form.setFieldProperty(fname, "textfont", uniFont, null);
}
form.setGenerateAppearances(true);
form.addSubstitutionFont(uniFont);
stamper.setFormFlattening(false);
stamper.close();
reader.close();
Template
Template filled
Font
Thanks in advance, Mik86

I'm not very familiar with internals of PDF, so any suggestions? Are the contents of my PDF files ok?
I'll have to dig into the PDF specification to see if there is something definitively incorrect going on, but to me there does appear to be a confusion.
Firstly, your input Template gives me an error when I attempt to open it in Acrobat, and LiveCycle complains that "UnicodeDoc" must be swapped out for a different font. "UnicodeDoc" is used within the original input file:
Note that the font "UnicodeDoc" is not embedded in your input file. When filling in you create and embed a font, but it looks like you don't overwrite the original (again, not to say this is correct or incorrect):
Without going too much into the inner workings of PDFs the form that is getting filled out still links to the original Font that isn't embedded.
This doesn't necessarily directly address the issue, but if I "fix" your document by removing the font from the original template:
input.pdf
And run it through your code it produces output.pdf which has the correct output in Acrobat and Reader.
Again, this isn't to say your PDF is wrong or iText is wrong in this case as I haven't looked through the entire specification to see what (if any) interaction is expected here, but as it stands the font that you are embedding is not the font that ends up getting used in the form field.

Related

how can i export DataGridView with ARABIC data from Visual Basic to PDF by using iTextSharp [duplicate]

I have a problem with inserting UNICODE characters in a PDF file in eclipse.
There is some solution for this that it is not very efficient for me.
The solution is like this.
document.add(new Paragraph("Unicode: \u0418", new Font(bfComic, 12)));
I want to retrieve data from a database and show them to the user and my characters are in Arabic script and sometimes in Farsi script.
What solution do you suggest?
thanks
You are experiencing different problems:
Encoding of the data:
Please download chapter 2 of my book and go to section 2.2.2 entitled "The Phrase object: a List of Chunks with leading". In this section, look for the title "Database encoding versus the default CharSet used by the JVM".
You will see that database values are retrieved like this:
String name1 = new String(rs.getBytes("given_name"), "UTF-8");
That’s because the database contains different names with special characters. You risk that these special characters are displayed as gibberish if you would retrieve the field like this:
String name2 = rs.getString("given_name")
Encoding of the font:
You create your font like this:
Font font = new Font(bfComic, 12);
You don't show how you create bfComic, but I assume that this object is a BaseFont object using IDENTITY_H as encoding.
Writing from right to left / making ligatures
Although your code will work to show a single character, it won't work to show a sentence correctly.
Suppose that name1 is the Arabic version of the name "Lawrence of Arabia" and that we want to write this name to a PDF. This is done three times in the following screen shot:
The first line is wrong, because the characters are in the wrong order. They are written from left to right whereas they should be written from right to left. This is what will happen when you do:
document.add(name1);
Even if the encoding is correct, you're rendering the text incorrectly.
The second line is also wrong. The characters are now in the correct order, but no ligatures are made: ل followed by و should be combined into a single glyph: لو
You can only achieve this by adding the content to a ColumnText or PdfPCell object, and by setting the run direction to PdfWriter.RUN_DIRECTION_RTL. For instance:
pdfCell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
Now the text will be rendered correctly.
This is explained in chapter 11 of my book. You can find a full example here: Ligatures2

How to move XFA xml data into PDF/A-2 conforming File with iText/XFA Worker

In the Adobe's ISO 32000 spec for PDF/A it states that XFA data can be stored in a special place in the PDF/A-2 confirming PDF. Here is the text of that section.
Incorporation of XFA Datasets into a PDF/A-2 Conforming File
To support PDF/A-2 conforming files, ExtensionLevel 3 adds support for XML form data (XFA datasets)
through the XFAResources name tree, which is part of the name dictionary of the document catalog.
(See “TABLE 3.28 Entries in the name dictionary” on page 23.) While Acrobat forms (and form data) are
permitted in a PDF/A-2 conforming file, XML forms are not. Such XML forms are specified as XDP streams
referenced from interactive form dictionaries. XDP streams can contain XFA datasets.
For applications that convert PDF documents to PDF/A-2, the XFAResources name tree supports
relocation of XML form data from XDP streams in a PDF document into the XFAResources name tree.
The XFAResources name tree consists of a string name and an indirect reference to a stream. The string
name is created at the time the document is converted to a PDF/A-2 conforming file. The stream contains
the element of the XFA, comprised of elements.
In addition to data values for XML form fields, the elements enable the storage and retrieval
of other types of information that may be useful for other workflows, including data that is not bound to
form fields, and one or more XML signature(s).
See the XML Architecture, XML Forms Architecture (XFA) Specification, version 2.6 in the Bibliography
We have an XFA Form that we pass xml to and now need to convert that document to PDF/A-2.
We are currently testing out XFA Worker to see if that will allow us to do this, I have been unable to find a sample of XFA Worker that will do this for us.
I first tried to flatten with XFA Worker but that removes the data completely and is no longer able to be extracted.
How do you get the XFA xml data into the place that Adobe says to put it in with XFA Worker?
UPDATE: Thanks Bruno, my code isn't allowing me to convert the XFA Form to PDF/A-2. Here is the code I used.
xfa.fillXfaForm(new ByteArrayInputStream(xmlSchemaStream.toByteArray()));
stamper.close();
reader.close();
try (ByteArrayOutputStream outputStreamDest = new ByteArrayOutputStream()) {
PdfReader pdfAReader = new PdfReader(output.toByteArray());
PdfAStamper pdfAStamper = new PdfAStamper(pdfAReader, outputStreamDest, PdfAConformanceLevel.PDF_A_2A);
....
and I get an error com.itextpdf.text.pdf.PdfAConformanceException: Only PDF/A documents can be opened in PdfAStamper.
So I am now assuming the new PdfAStamper isn't a converter but just reading in the byte array of the XFA PDF.
Allow me to start with some fatherly advice. XFA will be deprecated in ISO-32000-2 (PDF 2.0) and it is great that you are turning your XFA documents into PDF/A documents. However, why would you choose for PDF/A-2? PDF/A-3 is identical to PDF/A-2 with one exception: in PDF/A-3, you are allowed to embed XML files. You can even indicate the relationship between the attached XML and the PDF. Wouldn't it be smarter to create a PDF/A-3 file and to attach the original data (not the XFA file) as an attachment?
Suppose that you'd ignore this fatherly advice, what could you do?
Annex D of ISO-19005-2 (and -3) tells you that you have to add an entry to the Names dictionary of the document catalog. Unfortunately, iText 5 doesn't allow you to add your own entries to this names dictionary while creating a file, so you will have to post-process the document.
Suppose that you have a file located in filePath, then you can get the Catalog entry and the Names entry of the Catalog entry like this:
PdfReader reader = new PdfReader(filePath);
PdfDictionary catalog = reader.getCatalog();
PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
You can add entries to this names dictionary. For instance: suppose that I want to add a stream with content some bytes as a custom entry, I would use this code:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
PdfDictionary catalog = reader.getCatalog();
PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
if (names == null) {
names = new PdfDictionary();
}
PdfStream stream = new PdfStream("Some bytes".getBytes());
PdfIndirectObject objref = stamper.getWriter().addToBody(stream);
names.put(new PdfName("ITXT_Custom"), objref.getIndirectReference());
catalog.put(PdfName.NAMES, names);
stamper.close();
reader.close();
}
The result would look like this:
In your case, you don't want to entry named ITXT_Custom. You want to add an entry called XFAResources and the value of that entry should be a name tree consisting of a string name and an indirect reference to a stream. It should be fairly easy to adapt my example to achieve this.
Note: All code provided by me on Stack Overflow can be used under the CC-BY-SA as defined in the Stack Exchange Network Terms of Service. If you do not like the CC-BY-SA, I also provide this code under the same license as used for iText, more specifically the AGPL.

PDFBox - include multiple color profiles during conversion to PDF/A

We are currently trying to merge multiple PDFs and create a PDF/A (1B) out of it.
Currently we face a problem when we want to fix the color profiles. The PDF we receive has no embedded color profiles, so during the merge functionality of PDFBox, no OutputIntents are merged. So in the last step we try to add the color profiles.
If we do not add any color profile, we get validation issues for RGB and CMYK. If we add both color profiles to the PDDocumentCatalog, then only the validation issues for the first one are gone. So if we add RGB first, we only get CMYK validation issues and vice versa.
Here is a part of the code when we add the color profiles:
public void convertToPDFA(PDDocument doc, String file){
PDMetadata metadata = new PDMetadata(doc);
PDDocumentCatalog cat = doc.getDocumentCatalog();
cat.setMetadata(metadata);
// do metadata stuff, just removed it for now
InputStream colorProfile = PDFService.class.getResourceAsStream("/pdfa/sRGB Color Space Profile.icm");
PDOutputIntent oi = new PDOutputIntent(doc, colorProfile);
oi.setInfo("sRGB IEC61966-2.1");
oi.setOutputCondition("sRGB IEC61966-2.1");
oi.setOutputConditionIdentifier("sRGB IEC61966-2.1");
oi.setRegistryName("http://www.color.org");
cat.addOutputIntent(oi);
This is the code for RGB, we also add another *.icm color profile for CMYK.
So the color profiles seem to be fine, because dependent on the one we add first, the validation issues are gone.
For me it feels like we are just missing a small thing that both color profiles will be accepted, or could it be that only one color profile can be used for the creation of a PDF/A?
Thanks in advance and kind regards
Only a single output intent is allowed, see here. An alternative is also mentioned there, which would be to use only ICC based colorspaces.
What should be possible (although beyond the scope of the question), would be to assign ICC profiles to /DeviceGray, /DeviceRGB, or /DeviceCMYK, by adding DefaultGray, DefaultRGB, or DefaultCMYK entries the ColorSpaces in the resource dictionary, as explained in section 8.6.5.6 of the PDF specification:
When a device colour space is selected, the ColorSpace subdictionary
of the current resource dictionary (see 7.8.3, "Resource
Dictionaries") is checked for the presence of an entry designating a
corresponding default colour space (DefaultGray, DefaultRGB, or
DefaultCMYK, corresponding to DeviceGray, DeviceRGB, or DeviceCMYK,
respectively). If such an entry is present, its value shall be used as
the colour space for the operation currently being performed.
Be aware that making PDF file PDF/A-1b conformant is often more trickier than just adding output intents - check your file with PDFBox preflight or with the online validator from PDF Tools, there are many possible errors. Which is why there are products from Callas Software or PDF Tools that convert PDF files to PDF/A.

Add a cover page to a PDF document

I create a PDF document with EVO PDF library from a HTML page using the code below:
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
byte[] outPdfBuffer = htmlToPdfConverter.ConvertUrl(url);
Response.AddHeader("Content-Type", "application/pdf");
Response.AddHeader("Content-Disposition", String.Format("attachment; filename=Merge_HTML_with_Existing_PDF.pdf; size={0}", outPdfBuffer.Length.ToString()));
Response.BinaryWrite(outPdfBuffer);
Response.End();
This produces a PDF document but I have another PDF document that I would like to use as cover page in the final PDF document.
One possiblity I was thinking about was to create the PDF document and then to merge my cover page PDF with the PDF produced by converter but this looks like an inefficient solution. Saving the PDF and loading back for merge seems to introduce a unnecessary overhead. I would like to merge the cover page while the PDF document produced by converter is still in memory.
The following line added in your code right after you create the HTML to PDF converter object should do the trick:
// Set the PDF file to be inserted before conversion result
htmlToPdfConverter.PdfDocumentOptions.AddStartDocument("CoverPage.pdf");

PDF acroform fields become non editable in Adobe reader after writing to it using Pdfbox APIs

I am reading a PDF which has editable fields and the fields can be edited by opening it through Adobe Reader. I am using PDFBox APIs to generate an output PDF with data filled for the editable fields in input PDF. The output PDF can be opened using Adobe Reader and I am able to see the field values but I am unable to edit those fields directly from Adobe reader.
There is also a JIRA ticket for this issue and it is unresolved according to this link :
https://issues.apache.org/jira/browse/PDFBOX-1121
Can anybody please tell me if this got resolved? Also, if possible please answer the following questions related to my question:
Is there any protection policy or access permission that I need to explicitly set in order to edit the output PDF from Adobe reader?
Every time I open the PDF that was written to using pdfbox APIs, I get this message prompt:
" The document has been changed since it was created and use of extended features is no longer available...."
I am using PdfBox 1.8.6 jar and Adobe reader 11.0.8. I would really appreciate if anybody could help me with this issue.
Code snippet added to aid responders in debugging :
String outputFileNameWithPath = "C:\myfolder\testop.pdf";
PDDocument pdf = null;
pdf = PDDocument.load( outputFileNameWithPath );
PDDocumentCatal og docCatalog = pdf.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
//The map pdfValues is a collection of the data that I need to set in the PDF
//I am unable to go into the details of my data soutce
// The key in the data map corresponds to the PDField's expanded name and data
// corresponds to the data that I am trying to set.
Iterator<Entry<String, String>> iter=pdfValues.entrySet().iterator();
String name=null;
String value=null;
PDField field=null;
//Iterate over all data and see if the PDF has a matching field.
while(iter.hasNext()) {
Map.Entry<String, String> currentEntry=iter.next();
name=currentEntry.getKey();
value=currentEntry.getValue();
if(name!=null) {
name=CommonUtils.fromSchemaNameToPdfName(name);
field=acroForm.getField(name);
}
if( field != null && value!=null )
{
field.setValue( value ); //setting the values once field is found.
}
}
// Set access permissions / encryption here before saving
pdf.save(outputFileNameWithPath);
Thanks.
The document has been changed since it was created and use of extended features is no longer available....
This indicates that the original form has been Reader-enabled, i.e. an integrated Usage-Rights digital signature has been applied to the document using a private key held by Adobe which tells the Adobe Reader that it shall make some extra functionality available to the user viewing that form.
If you don't want to break that signature during form fill-ins with PDFBox, you need to make sure that you
don't do any changes but form fill-ins and
save the changes as incremental update.
If you provided your form fill-in code and your source PDF, this could be analyzed in more detail.