Opening a content stream blanks saved content? - pdfbox

I am trying to modify an existing PDF by adding some text to the header of each page. But even the simple sample code I have below ends up generating me a blank PDF as output:
document = PDDocument.load(new File("c:/tmp/pdfbox_test_in.pdf"));
PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
/*
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.moveTextPositionByAmount(100, 100);
contentStream.drawString("Hello");
contentStream.endText();
*/
contentStream.close();
document.save("c:/tmp/pdfbox_test_out.pdf");
document.close();
(same result whether the commented block is executed or not).
So how is simply opening the content stream and closing it enough to blank the saved document? Is there some other API calls I need to be making in order to not have the content be stripped out?
Surprisingly, I couldn't find a PDFBox recipe for this type of change.

You use
PDPageContentStream contentStream = new PDPageContentStream(document, page);
This constructor is implemented like this:
public PDPageContentStream(PDDocument document, PDPage sourcePage) throws IOException
{
this(document, sourcePage, false, true);
}
which in turn calls this
/**
* Create a new PDPage content stream.
*
* #param document The document the page is part of.
* #param sourcePage The page to write the contents to.
* #param appendContent Indicates whether content will be overwritten. If false all previous content is deleted.
* #param compress Tell if the content stream should compress the page contents.
* #throws IOException If there is an error writing to the page contents.
*/
public PDPageContentStream(PDDocument document, PDPage sourcePage, boolean appendContent, boolean compress) throws IOException
So that two-parameter constructor always uses appendContent = false which causes all previous content to be deleted.
Thus, you should instead use
PDPageContentStream contentStream = new PDPageContentStream(document, page, true, true);
to append to the current content.

Ugh, apparently the version of PDFBox we are using in our project needs to be upgraded. I just noticed that the latest API has the constructor I need:
public PDPageContentStream(PDDocument document, PDPage sourcePage, boolean appendContent, boolean compress)
So changing to this constructor and using appendContent=true, I got the above sample working.
Time for an upgrade...

Related

itext html to pdf without embedding fonts

I'm following this guide in Chapter 6 of iText 7: Converting HTML to PDF with pdfHTML on adding extra fonts:
public static final String FONT = "src/main/resources/fonts/cardo/Cardo-Regular.ttf";
public void createPdf(String src, String font, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
FontProvider fontProvider = new DefaultFontProvider(false, false, false);
FontProgram fontProgram = FontProgramFactory.createFont(font);
fontProvider.addFont(fontProgram, "Winansi");
properties.setFontProvider(fontProvider);
HtmlConverter.convertToPdf(new File(src), new File(dest), properties);
}
While it's working as expected and embedding subsets of the fonts being used, I'm wondering if there is a way for the resulting PDF document to not embed the fonts at all. This is possible when creating BaseFont instances and setting the embedded property to false and using them to build various PDF building blocks. What I'm looking for is this same behavior when using the HtmlConverter.convertToPdf().
What you should normally do is override FontProvider:
FontProvider fontProvider = new DefaultFontProvider(false, false, false) {
#Override
public boolean getDefaultEmbeddingFlag() {
return false;
}
};
However, the problem is that at the moment this font provider would be overwritten by pdfHTML further into the pipeline in ProcessorContext#reset.
While this issue is not fixed in iText you can build a custom version of pdfHTML for your needs. The repo is located at https://github.com/itext/i7j-pdfhtml and you are interested in this line. Just replace it with the overload as above and build the jar.
UPD The fix is available starting from pdfHTML 2.1.3. From that version on you can use custom font providers freely.

PDFBox Signature Field not well recognized

I'm going in trouble using PDFBox 2.0.0-RC3 and producing a digital signature field into a PDF.
This is the piece of code i use:
public static void main(String[] args) throws IOException, URISyntaxException
{
PDDocument document;
document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDAcroForm acroForm = new PDAcroForm(document);
document.getDocumentCatalog().setAcroForm(acroForm);
PDSignatureField signatureBox = new PDSignatureField(acroForm);
signatureBox.setPartialName("ENSGN-MY_SIGNATURE_FIELD-001");
acroForm.getFields().add(signatureBox);
PDAnnotationWidget widget = signatureBox.getWidgets().get(0);
PDRectangle rect = new PDRectangle();
rect.setLowerLeftX(50);
rect.setLowerLeftY(750);
rect.setUpperRightX(250);
rect.setUpperRightY(800);
widget.setRectangle(rect);
page.getAnnotations().add(widget);
try {
document.save("/tmp/mySignatureFieldGEN_PDFBOX.pdf");
document.close();
} catch (Exception io) {
System.out.println(io);
}
}
The code generates a pdf document, i open it with Acrobat Reader and this is the result:
PDF BOX Generated
As you can see, the signature panel on the left is void but the signature field on the left is present and works.
I generate the same PDF with PDFTron. This is the result:
PDF Tron Generated
In this case the signature panel on the left show correctly the presence of the signature field.
I would like to obtain this second case (correct) but i don't understand why PDF Box can do this.
Many thanks
add this:
widget.setPage(page);
This sets the /P entry.
Now the panel on the left appears. How did I get the idea? I got a document with such an empty signature field (from here), and compared it with yours with PDFDebugger.

Issues with iTextsharp and pdf manipulation

I am getting a pdf-document (no password) which is generated from a third party software with javascript and a few editable fields in it. If I load this pdf-document with the pdfReader class the NumberOfPagesProperty is always 1 although the pdf-document has 17 pages. Oddly enough the document has 17 pages if I save the stream afterwards. When I now try to open the document the Acrobat Reader shows an extended feature warning and the fields are not fillable anymore (I haven't flattened the document). Do anyone know about such a problem?
Background Info:
My job is to remove the javascript code, fill out some fields and save the document afterwards.
I am using the iTextsharp version 5.5.3.0.
Unfortunately I can't upload a sample file because there are some confidental data in it.
private byte[] GetDocumentData(string documentName)
{
var document = String.Format("{0}{1}\\{2}.pdf", _component.OutputDirectory, _component.OutputFileName.Replace(".xml", ".pdf"), documentName);
if (File.Exists(document))
{
PdfReader.unethicalreading = true;
using (var originalData = new MemoryStream(File.ReadAllBytes(document)))
{
using (var updatedData = new MemoryStream())
{
var pdfTool = new PdfInserter(originalData, updatedData) {FormFlattening = false};
pdfTool.RemoveJavascript();
pdfTool.Save();
return updatedData.ToArray();
}
}
}
return null;
}
//Old version that wasn't working
public PdfInserter(Stream pdfInputStream, Stream pdfOutputStream)
{
_pdfInputStream = pdfInputStream;
_pdfOutputStream = pdfOutputStream;
_pdfReader = new PdfReader(_pdfInputStream);
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream);
}
//Solution
public PdfInserter(Stream pdfInputStream, Stream pdfOutputStream, char pdfVersion = '\0', bool append = true)
{
_pdfInputStream = pdfInputStream;
_pdfOutputStream = pdfOutputStream;
_pdfReader = new PdfReader(_pdfInputStream);
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream, pdfVersion, append);
}
public void RemoveJavascript()
{
for (int i = 0; i <= _pdfReader.XrefSize; i++)
{
PdfDictionary dictionary = _pdfReader.GetPdfObject(i) as PdfDictionary;
if (dictionary != null)
{
dictionary.Remove(PdfName.AA);
dictionary.Remove(PdfName.JS);
dictionary.Remove(PdfName.JAVASCRIPT);
}
}
}
The extended feature warning is a hint that the original PDF had been signed using a usage rights signature to "Reader-enable" it, i.e. to tell the Adobe Reader to activate some additional features when opening it, and the OP's operation on it has invalidated the signature.
Indeed, he operated using
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream);
which creates a PdfStamper which completely re-generates the document. To not invalidate the signature, though, one has to use append mode as in the OP's fixed code (for char pdfVersion = '\0', bool append = true):
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream, pdfVersion, append);
If I load this pdf-document with the pdfReader class the NumberOfPagesProperty is always 1 although the pdf-document has 17 pages. Oddly enough the document has 17 pages
Quite likely it is a PDF with a XFA form, i.e. the PDF is only a carrier of some XFA data from which Adobe Reader builds those 17 pages. The actual PDF in that case usually only contains one page saying something like "if you see this, your viewer does not support XFA."
For a final verdict, though, one has to inspect the PDF.

Unable to get page number when using RenderListener interface to find a piece of text in PDF

iText requires coordinates to create form fields and Page Number in existing PDFs at different places.
My PDF is dynamic. So I decided to creat the PDF with some identifier text. And use TextRenderInfo to find the coordinates for the text and use those coordinates to creat the textfields and other form fields.
ParsingHelloWorld.java
public void extractText(String src, String dest) throws IOException, DocumentException {
PrintWriter out = new PrintWriter(new FileOutputStream(dest));
PdfReader reader = new PdfReader(src);
PdfStamper stp = new PdfStamper(reader, new FileOutputStream(dest);
RenderListener listener = new MyTextRenderListener(out,reader,stp);
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(listener);
for ( int pageNum= 0; pageNum < reader.getNumberOfPages(); pageNum++ ){
PdfDictionary pageDic = reader.getPageN(pageNum);
PdfDictionary resourcesDic = pageDic.getAsDict(PdfName.RESOURCES);
processor.processContent(ContentByteUtils.getContentBytesForPage(reader, pageNum), resourcesDic);
}
out.flush();
out.close();
stp.close();
}
MyTextRenderListener.java
public void renderText(TextRenderInfo renderInfo) {
if (renderInfo.getText().startsWith("Fill_in_TextField")){
// creates the text fields by getting co-ordinates form the renderinfo object.
createTextField(renderInfo);
}else if (renderInfo.getText().startsWith("Fill_in_SignatureField")){
// creates the text fields by getting co-ordinates form the renderinfo object.
createSignatureField(renderInfo);
}
}
The problem is I have a page number in extractText method in the ParsingHelloWorld class.
When the renderText method is called inside the MyTextRenderListener class internally processing the page content, I couldn't get the pageNumber to generate the fields in the PDF at the particular coordinates where the identifier text resides(ex Fill_in_TextField,Fill_in_SignatureField..etc ).
Any suggestions/ ideas to get the page number in my scenario.
Thanks in advance.
That's easy. Add a parameter to MyTextListener:
protected int page;
public void setPage(int page) {
this.page = page;
}
Now when you loop over the pages in ParsingHelloWorld, pass the page number to MyTextListener:
listener.setPage(pageNum);
Now you have access to that number in the renderText() method and you can pass it to your createTextField() method.
Note that I think your loop is wrong. Page numbers don't start at page 0, they start at page 1.

What's the easiest way of converting an xhtml string to PDF using Flying Saucer?

I've been using Flying Saucer for a while now with awesome results.
I can set a document via uri like so
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(xhtmlUri);
Which is nice, as it will resolve all relative css resources etc relative to the given URI. However, I'm now generating the xhtml, and want to render it directly to a PDF (without saving a file). The appropriate methods in ITextRenderer seem to be:
private Document loadDocument(final String uri) {
return _sharedContext.getUac().getXMLResource(uri).getDocument();
}
public void setDocument(String uri) {
setDocument(loadDocument(uri), uri);
}
public void setDocument(Document doc, String url) {
setDocument(doc, url, new XhtmlNamespaceHandler());
}
As you can see, my existing code just gives the uri and ITextRenderer does the work of creating the Document for me.
What's the shortest way of creating the Document from my formatted xhtml String? I'd prefer to use the existing Flying Saucer libs without having to import another XML parsing jar (just for the sake of consistent bugs and functionality).
The following works:
Document document = XMLResource.load(new ByteArrayInputStream(templateString.getBytes())).getDocument();
Previously, I had tried
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
final DocumentBuilder documentBuilder = dbf.newDocumentBuilder();
Document document = documentBuilder.parse(new ByteArrayInputStream(templateString.getBytes()));
but that fails as it attempts to download the HTML docType from http://www.w3.org (which returns 503's for the java libs).
I use the following without problem:
final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setValidating(false);
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
builder.setEntityResolver(FSEntityResolver.instance());
org.w3c.dom.Document document = builder.parse(new ByteArrayInputStream(doc.toString().getBytes()));
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(document, null);
renderer.layout();
renderer.createPDF(os);
The key differences here are passing in a null URI, and also provided the DocumentBuilder with an entity resolver.