Why am I getting NullPointterException using PDField.setValue() from PDFBox 2.0.19? - nullpointerexception

I am trying to change text in a field of a PDF file but keep getting NullPointerException. I'm using PDFBox 2.0.19.
The file is loading fine, I'm able to for example add a page and save a new one on Desktop, but setting filed's value keeps crashing.
Can you help me please? What am I doing wrong?
public static void main(String args[]) throws IOException {
File file = new File("C:/Users/Bondi/Desktop/karta.pdf");
PDDocument pdDocument = PDDocument.load(file);
PDDocumentCatalog pdDocumentCatalog = pdDocument.getDocumentCatalog();
PDAcroForm pdAcroForm = pdDocumentCatalog.getAcroForm();
if (pdAcroForm != null) {
PDField pdField = (PDField) pdAcroForm.getField("imie_badacza");
pdField.setValue("Badacz");
}
pdDocument.save("C:/Users/Bondi/Desktop/karta2.pdf");
pdDocument.close();
}
Error Screenshot:
My pdf's fields hierarchy:

You should use the fully qualified field name. I.e. instead of
PDField pdField = (PDField) pdAcroForm.getField("imie_badacza");
use
PDField pdField = (PDField) pdAcroForm.getField("topmostSubform.Page1.imie_badacza");

Related

Understanding loading of font in PDFBox 2.0

I have finally succeeded in making PDFBox print my unicodes.
But now, I would like to understand the solution that I have come up with.
The code below works and prints a ≥ to the page.
Two things do not work:
changing
PDType0Font.load(documentMock, systemResourceAsStream, true);
to
PDType0Font.load(documentMock, systemResourceAsStream, false);
changing
final PDFont robotoLight = loadFontAlternative("Roboto-Light.ttf");
to
final PDFont robotoLight = loadFont("Roboto-Light.ttf");
The first change prints two dots instead of the character.
What does embedSubset do, since it does not work when set to false?
The documentation is too sparse for me to understand.
The second change gives the following exception Exception in thread "main" java.lang.IllegalArgumentException: U+2265 is not available in this font's encoding: WinAnsiEncoding
This problem has been covered in many other questions that pre-dates PDFBox 2.0 where there was a bug in handling unicodes.
So, they do not answer the question directly.
That aside, the problem is clear: I should not set the encoding to WinAnsiEncoding but something different.
But what should the encoding be? and why is there no UTF-8 encoding or similar available?
There is no documentation in COSName about the many options.
public class SimpleReportUnicode {
public static void main(String[] args) throws IOException {
PDDocument report = createReport();
final String fileLocation = "c:/SimpleFormUnicode.pdf";
report.save(fileLocation);
report.close();
}
private static PDDocument createReport() throws IOException {
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
final PDFont robotoLight = loadFontAlternative("Roboto-Light.ttf");
writeText(contentStream, robotoLight, 100, 650);
contentStream.close();
return document;
}
private static void writeText(PDPageContentStream contentStream, PDFont font, double x, double y) {
try {
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.moveTextPositionByAmount((float) x, (float) y);
String unicode = "≥";
contentStream.showText(unicode);
contentStream.endText();
}
catch (IOException e) {
}
}
private static PDFont loadFont(String location) {
PDFont font;
try {
PDDocument documentMock = new PDDocument();
InputStream systemResourceAsStream = ClassLoader.getSystemResourceAsStream(location);
Encoding encoding = Encoding.getInstance(COSName.WIN_ANSI_ENCODING);
font = PDTrueTypeFont.load(documentMock, systemResourceAsStream, encoding);
}
catch (IOException e) {
throw new RuntimeException("IO exception");
}
return font;
}
private static PDFont loadFontAlternative(String location) {
PDDocument documentMock = new PDDocument();
InputStream systemResourceAsStream = ClassLoader.getSystemResourceAsStream(location);
PDFont font;
try {
font = PDType0Font.load(documentMock, systemResourceAsStream, true);
}
catch (IOException e) {
throw new RuntimeException("IO exception");
}
return font;
}
}
EDIT
If you want to use the same font as in the code, Roboto is available here:
https://fonts.google.com/specimen/Roboto
Add Roboto-Light.ttf to your classpath and the code should work out of the box.
As discussed in the comments:
The problem with embedSubsets went away by using version 2.0.7. (Btw 2.0.8 was released today);
The problem "U+2265 is not available in this font's encoding: WinAnsiEncoding" is explained in the FAQ and the solution is to use PDType0Font.load() which you already did in your working version;
There is no UTF-8 encoding for fonts because it isn't available in the PDF specification;
using embedSubsets true produces a 4KB file, with false the file is 100KB because the full font is embedded, so false is usually best.

Webdings font characters not extracted using pdfbox

I am using pdfbox to get the names of all fonts that are used in a pdf.
So far it was working well. However, I recently came across a pdf that has 'Webdings' font. PDFBox was not able to identify it.Could anyone help please.
This is the code I have used:
public static Set<String> extractFonts(String pdfPath) throws IOException
{
PDDocument doc = PDDocument.load(new File(pdfPath));
PDPageTree pages = doc.getDocumentCatalog().getPages();
Set<String> fontSet = new HashSet<String>();
try{
for(PDPage page:pages){
PDResources res = page.getResources();
for (COSName fontName : res.getFontNames())
{
PDFont font = res.getFont(fontName);
if(font != null){
String fontUsedName = font.getName();
if(fontUsedName.contains("+")) {
fontUsedName = fontUsedName.substring(fontUsedName.indexOf("+")+1, fontUsedName.length());
}
fontSet.add(fontUsedName);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(fontSet);
return fontSet;
}
I was able to know that the font 'Webdings' is present from the File-> Properties->Fonts option in Adobe Reader

Pdfbox - adding pdf embedded File and save the PDDocument to OutputStream does not keep the embedded Files

I'm using Pdfbox (1.8.8) to adding attachments to a pdf. My problem is when one of the attachments is of type .pdf and i'm saving the PDDocument to OutputStream the final pdf document does not include the attachments. If a save the PDDocument to a file instead an OutputStream all works just fine, and if the attachments does not include any pdf, both save to file or OutputStream works fine.
I would like to know if there is any way to add pdf embedded Files and save the PDDocument to OutputStream keeping the attached files in the final pdf that is generated.
The code i'm using is:
private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
final PDDocument doc;
Boolean hasPdfAttach = false;
try {
doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
// final PDFTextStripper pdfStripper = new PDFTextStripper();
// final String text = pdfStripper.getText(doc);
final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
final Map embeddedFileMap = new HashMap();
PDEmbeddedFile embeddedFile;
File file = null;
for (Attachment attach : attachmentsResources) {
// first create the file specification, which holds the embedded file
final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
fileSpecification.setFile(attach.getFilename());
file = AttachmentUtils.getAttachmentFile(attach);
final InputStream is = new FileInputStream(file.getAbsolutePath());
embeddedFile = new PDEmbeddedFile(doc, is);
// set some of the attributes of the embedded file
if ("application/pdf".equals(attach.getMimetype())) {
hasPdfAttach = true;
}
embeddedFile.setSubtype(attach.getMimetype());
embeddedFile.setSize((int) (long) attach.getFilesize());
fileSpecification.setEmbeddedFile(embeddedFile);
// now add the entry to the embedded file tree and set in the document.
embeddedFileMap.put(attach.getFilename(), fileSpecification);
// final String text2 = pdfStripper.getText(doc);
}
// final String text3 = pdfStripper.getText(doc);
efTree.setNames(embeddedFileMap);
// ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS); (this not work for me)
// attachments are stored as part of the "names" dictionary in the document catalog
final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
names.setEmbeddedFiles(efTree);
doc.getDocumentCatalog().setNames(names);
// final ByteArrayOutputStream pdfboxToDocumentStream = new ByteArrayOutputStream();
final String tmpfile = "temporary.pdf";
if (hasPdfAttach) {
final File f = new File(tmpfile);
doc.save(f);
doc.close();
//i have try with parser but without success too
// PDFParser parser = new PDFParser(new FileInputStream(tmpfile));
// parser.parse();
// PDDocument doc2 = parser.getPDDocument();
final PDDocument doc2 = PDDocument.loadNonSeq(f, new RandomAccessFile(new File(getHomeTMP()
+ "tempppp.pdf"), "r"));
doc2.save(out);
doc2.close();
} else {
doc.save(out);
doc.close();
}
//that does not work too
// final InputStream in = new FileInputStream(tmpfile);
// IOUtils.copy(in, out);
// out = new FileOutputStream(tmpFile);
// doc.save (out);
} catch (IOException e1) {
e1.printStackTrace();
} catch (Exception e2) {
e2.printStackTrace();
}
}
Best regards
Solution:
private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
final PDDocument doc;
try {
doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
((ByteArrayOutputStream) out).reset();
final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
final Map embeddedFileMap = new HashMap();
PDEmbeddedFile embeddedFile;
File file = null;
for (Attachment attach : attachmentsResources) {
// first create the file specification, which holds the embedded file
final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
fileSpecification.setFile(attach.getFilename());
file = AttachmentUtils.getAttachmentFile(attach);
final InputStream is = new FileInputStream(file.getAbsolutePath());
embeddedFile = new PDEmbeddedFile(doc, is);
// set some of the attributes of the embedded file
embeddedFile.setSubtype(attach.getMimetype());
embeddedFile.setSize((int) (long) attach.getFilesize());
fileSpecification.setEmbeddedFile(embeddedFile);
// now add the entry to the embedded file tree and set in the document.
embeddedFileMap.put(attach.getFilename(), fileSpecification);
}
efTree.setNames(embeddedFileMap);
((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
// attachments are stored as part of the "names" dictionary in the document catalog
final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
names.setEmbeddedFiles(efTree);
doc.getDocumentCatalog().setNames(names);
((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
doc.save(out);
doc.close();
} catch (IOException e1) {
e1.printStackTrace();
} catch (Exception e2) {
e2.printStackTrace();
}
}
You store the new PDF after the original PDF in out:
Look at all the uses of out in your method:
private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
...
doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
...
doc2.save(out);
...
doc.save(out);
So you get as input a ByteArrayOutputStream and take its current content as input (i.e. the ByteArrayOutputStream is not empty but already contains a PDF) and after some processing you append the modified PDF to the ByteArrayOutputStream. Depending on the PDF viewer you present this to, you will be shown either the original or the manipulated PDF or a (very correct) error message that the file is garbage.
If you want the ByteArrayOutputStream to contain only the manipulated PDF, simply add
((ByteArrayOutputStream) out).reset();
or (if you are not sure about the state of the stream)
out = new ByteArrayOutputStream();
right after
doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
PS: According to the comments the OP tried the above proposed changes to his code without success.
I cannot run the code as presented in the question because it is not self-contained. Thus, I reduced it to the essentials to get a self-contained test:
#Test
public void test() throws IOException, COSVisitorException
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (
InputStream sourceStream = getClass().getResourceAsStream("test.pdf");
InputStream attachStream = getClass().getResourceAsStream("artificial text.pdf"))
{
final PDDocument document = PDDocument.load(sourceStream);
final PDEmbeddedFile embeddedFile = new PDEmbeddedFile(document, attachStream);
embeddedFile.setSubtype("application/pdf");
embeddedFile.setSize(10993);
final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
fileSpecification.setFile("artificial text.pdf");
fileSpecification.setEmbeddedFile(embeddedFile);
final Map<String, PDComplexFileSpecification> embeddedFileMap = new HashMap<String, PDComplexFileSpecification>();
embeddedFileMap.put("artificial text.pdf", fileSpecification);
final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
efTree.setNames(embeddedFileMap);
final PDDocumentNameDictionary names = new PDDocumentNameDictionary(document.getDocumentCatalog());
names.setEmbeddedFiles(efTree);
document.getDocumentCatalog().setNames(names);
document.save(baos);
document.close();
}
Files.write(Paths.get("attachment.pdf"), baos.toByteArray());
}
As you see PDFBox here uses only streams. The result:
Thus, PDFBox without problem stores a PDF into which it has embedded a PDF file attachment.
The problem, therefore, most likely have nothing to do with this work flow as such

How to set interactive PDF form in read-only mode while writing as new PDF by using Apache PDFBox?

I am using Apache PDFBox library to fill information in fillable PDF form(AcroFrom). After complete information filling, I needs to write as a new PDF file (in non-editable format).
I tried setReadOnly() method, which is available in AccessPermission class. But still I can able to edit the values in new created PDF document.
Code:
private static PDDocument _pdfDocument;
public static void main(String[] args) {
String originalPdf = "C:/sample/Original.pdf";
String targetPdf = "C:/sample/target.pdf";
try {
populateAndCopy(originalPdf, targetPdf);
-----------
-----------
-----------
-----------
}
} // Main method complted
private static void populateAndCopy(String originalPdf, String targetPdf) throws IOException, COSVisitorException {
_pdfDocument = PDDocument.load(originalPdf);
_pdfDocument.getNumberOfPages();
_pdfDocument.getCurrentAccessPermission().setCanModify(false);
_pdfDocument.getCurrentAccessPermission().setReadOnly();
System.out.println(_pdfDocument.getCurrentAccessPermission().isReadOnly());
_pdfDocument.save(targetPdf);
_pdfDocument.close();
}
Please help me to fix this issue.
Your code does not set any encryption, that is the problem.
Try this:
AccessPermission ap = new AccessPermission();
ap.setCanModify(false);
ap.setReadOnly();
StandardProtectionPolicy spp = new StandardProtectionPolicy("owner-password", "", ap);
spp.setEncryptionKeyLength(128);
doc.protect(spp);
doc.save(targetPdf);
doc.close();
I've set 128 as the keylength as 256 is not supported in 1.8 and 40 is too short.
A user will be able to open the document without password (see the empty password parameter), but he'll have only the restricted rights.
public static void main(String[] args) {
try {
String formTemplate = "xyz.pdf";
// load the document
PDDocument pdfDocument = PDDocument.load(new File(formTemplate));
// get the document catalog
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null)
{
// Retrieve an individual field and set its value.
PDTextField field1 = (PDTextField) acroForm.getField( "_lastName" );
field1.setValue("pppp");
PDTextField field2 = (PDTextField) acroForm.getField( "_firstName" );
field2.setValue(aaaa");
}
// flatten() method saves the PDF read only
acroForm.flatten();
// Save and close the filled out form.
pdfDocument.save("xyz.pdf");
pdfDocument.close();
System.out.println("Done!!");
} catch(Exception e) {
e.printStackTrace();
}
}

How does one save the .MoreInfo property of a PDF with iTextSharp?

I currently have the following class that I'm trying to add a Hashtable of metadata properties to a PDF. The problem is, even though it appears to assign the hashtable to the stamper.MoreInfo property it doesn't appear to save the MoreInfo property once the stamper is closed.
public class PdfEnricher
{
readonly IFileSystem fileSystem;
public PdfEnricher(IFileSystem fileSystem)
{
this.fileSystem = fileSystem;
}
public void Enrich(string pdfFile, Hashtable fields)
{
if (!fileSystem.FileExists(pdfFile)) return;
var newFile = GetNewFileName(pdfFile);
var stamper = GetStamper(pdfFile, newFile);
SetFieldsAndClose(stamper, fields);
}
string GetNewFileName(string pdfFile)
{
return fileSystem.GetDirectoryName(pdfFile) + #"\NewFileName.pdf";
}
static void SetFieldsAndClose(PdfStamper stamper, Hashtable fields)
{
stamper.MoreInfo = fields;
stamper.FormFlattening = true;
stamper.Close();
}
static PdfStamper GetStamper(string pdfFile, string newFile)
{
var reader = new PdfReader(pdfFile);
return new PdfStamper(reader, new FileStream(newFile, FileMode.Create));
}
}
Any ideas?
As always, Use The Source.
In this case, I saw a possibility fairly quickly (Java source btw):
public void close() throws DocumentException, IOException {
if (!hasSignature) {
stamper.close( moreInfo );
return;
}
Does this form already have signatures of some sort? Lets see when hasSignatures would be true.
That can't be the case with your source. hasSignatures is only set when you sign a PDF via PdfStamper.createSignature(...), so that's clearly not it.
Err... how are you checking that your MoreInfo was added? It won't be in the XMP metadata. MoreInfo is added directly to the Doc Info dictionary. You see them in the "Custom" tab of Acrobat (and most likely Reader, though I don't have it handy at the moment).
Are you absolutely sure MoreInfo isn't null, and all its values aren't null?
The Dictionary is just passed around by reference, so any changes (in another thread) would be reflected in the PDF as it was written.
The correct way to iterate through a document's "Doc info dictionary":
PdfReader reader = new PdfReader(somePath);
Map<String, String> info = reader.getInfo();
for (String key : info.keySet()) {
System.out.println( key + ": " + info.get(key) );
}
Note that this will go through all the fields in the document info dictionary, not just the custom ones. Also be aware that changes made the the Map from getInfo() will not carry over to the PDF. The map is new'ed, populated, and returned.