how to apply digital signature on all pages of pdf using PDFBox library [duplicate] - pdfbox

I want to place same externally signed signature container (signature value) at multiple places in a PDF.
I have referred the page 'How to place the Same Digital signatures to Multiple places in PDF using itextsharp.net'.
While working with the above mentioned work-around, I observed that whenever I tried to place multiple signatures on single page like 4-5 times, it never worked. Always shows only one valid signature field and other fields as unsigned (unsigned PDF form fields). So couldn't understand the problem.
Now I wanted to know whether any reference material is available to see how PdfLiteral and PdfIndirectReference works? I have gone through the itextsharp reference document but couldn't get enough information. In addition to this is there any limitation on how many annotations/signature fields one can add in a PDF?
And If I have to use BlankSignatureContainer and MakeSignature.SignDeferred then how the signature will get attached to all the fields because in,
MakeSignature.SignDeferred(pdfreader, "Sig", output, externalcontainer)
we have to pass only one signature field name.
Thank you.

You are asking for something of which mkl wrote:
Beware: While this procedure creates something which does not violate
the letter of the PDF specifications (which only forbid the cases
where the same field object is referenced from multiple pages, be it
via the same or via distinct widgets), it clearly does violate its
intent, its spirit. Thus, this procedure might also become forbidden as part of a Corrigenda document for the specification.
Actually, what you are asking does violate the specification. See section 12.7.5.5 of the ISO standard for PDF:
Allow me to repeat the last line of this screen shot:
signature fields shall never refer to more than one annotation.
There is a shall in this sentence, not a should. A should isn't normative. It means that you should or shouldn't do something, but that you are not in violation with the spec if don't or do. Not respecting results in a PDF document that is in violation with the PDF specification, and that in the strict sense isn't a real PDF file.
That is a path you don't want to go, because being in violation with the PDF specification voids your right to use a series of PDF patents owned by Adobe. Adobe owns patents that can be used by everyone for free (perpetual, non-exclusive, royalty-free,...) on condition that you respect the ISO specification.
For that reason, please do not expect an answer to your question, except for the recommendation to abandon your requirement. PDF viewers that comply with the PDF specification won't expect a single signature to be placed at different locations because that's not allowed by the spec, so even if you would adapt your software to create more than one widget annotation / appearance for a single signature field, there is no guarantee that a PDF viewer will understand what you're trying to do.

Related

PDF Acrobat Verifying Signature

I am verifying a PDF with two signatures (Adobe Acrobat), both valid. One of them has a text say "cambio(s) varios" (my Adobe Acrobat is in Spanish) translating to Enghish "change(s) various", my question is I don´t know what it mean. Signatures are valid and the PDF is correct.
Thanks in advance
First of all, to outline what this is about, the Adobe Acrobat Reader signature panel looks like this for the document at hand
and the question is about the
1 Miscellaneous Change(s)
in-between.
According to Adobe Documentation
In a number of documents Adobe enumerates possible modification entries and characterizes "Miscellaneous Change(s)" like this:
Miscellaneous: Some changes which occur in memory or cannot be explicitly listed are labelled miscellaneous.
(e.g. in "Digital Signatures Workflow Guide for the Adobe® Acrobat Family of Products")
Now this documentation obviously is no help at all...
According to Adobe Acrobat
Fortunately Adobe Acrobat can be asked to show "Document Integrity Properties":
(Adobe Acrobat 9.5 output on "Signature Properties" - "Legal" - "View Document Integrity Properties...")
I assume it is this detail that makes Adobe Reader warn about miscellaneous changes.
In Your Document
Looking for a transfer function use in your document one quickly indeed finds one in a ExtGState resource of page 1:
The TR entry in that graphics state dictionary sets the transfer function here.
Interestingly the transfer function used is the Identity function! I assume that in most normal use cases setting the transfer function to Identity changes nothing...
What to Do
Thus, I would propose you change your original document creation to not include transfer functions, in particular not Identity transfer functions. Alternatively pre-process your documents before applying the first signature and remove such functions.

PDF Copy Text Issue: Weird Characters

I tried to copy text from a PDF file but get some weird characters. Strangely, Okular can recoqnize the text, but not with Sumatra PDF or Adobe, all three applications are installed in Windows 10 64 bit. To better explain my issue, here is the video https://streamable.com/sw1hc. The "text layer workaround file" is one solution I got. Any help is greatly appreciated. Regards
In short: The (original) PDF does not contain the information required for regular text extraction as described in the PDF specification. Depending on the exact nature of your task, you might try to add the required information to the existing text objects and fonts or you might go for OCR.
Mapping character codes to Unicode as described in the PDF specification
The PDF specification ISO 32000-1 (and similarly ISO 32000-2, too) describes an algorithm for mapping character codes to Unicode values using information available directly inside the PDF.
It has been quoted very often in other stack overflow answers (see here, here, here, here, here, or here), so I won't quote it here again.
Essentially this is the algorithm used by Adobe Acrobat during copy&paste and also by many other text extractors.
In PDFs which don't contain the information required for text extraction, you eventually get to this point in the algorithm:
If these methods fail to produce a Unicode value, there is no way to determine what the character code represents in which case a conforming reader may choose a character code of their choosing.
What happens if the algorithm above fails to produce a Unicode value
This is where the text extraction implementations differ, they try to determine the matching Unicode value by using heuristics or information from beyond the PDF or applying OCR to the glyph in question.
That the different programs you tried returned so different results shows that
your PDF does not contain the information required for the algorithm above from the PDF specification and
the heuristics used by those programs differ relevantly and Okular's heuristics work best for your document.
What to do in such a case
There are multiple options, more or less feasible depending on your concrete case:
Ask the source of the PDF for a version that contains proper information for text extraction.
Unless you have a contract with that source that requires them to supply the PDFs in a machine readable form or the source is otherwise obligated to do so, they usually will decline, though...
Apply OCR to the PDF in question.
Depending on the quality of the OCR software and the glyphs in the PDF, the results can be of a questionable quality; e.g. in your "PDF copy text issue-Text layer workaround.pdf" the header "Chapter 1: Derivative Securities" has been recognized as "Chapter1: Deratve Securites"...
You can try to interactively add manually created ToUnicode maps to the PDF, e.g. as described by Tilman Hausherr in his answer to "how to add unicode in truetype0font on pdfbox 2.0.0".
Depending on the number of different fonts you have to create the mappings for, this approach might easily require way too much time and effort...

Can one remove or revoke digital signature on PDF document?

I have a theoretical question about PAdES. I want to know if it is possible to revoke a signature in PDF or remove it?
I don't know what exactly you technically mean by revoking a signature.
But it clearly is possible to remove a signature: An integrated PDF signature usually consists of a signature form field with a value that contains a CMS signature container.
You have the choice of either removing only that value or the whole field with the value.
The former option leaves an empty signature field, which can easily be used for a new signature with a visualization at the same location as your original signature (if it has any to start with).
The latter option removes your signature completely.
Two caveats, though:
If you don't merely want the signature not to appear anymore, make sure that
you don't save this edit as an incremental update - if it was done as an incremental update, the document version with your signature could easily be restored;
you don't merely remove the reference to the the value from the signature field but that you actually clear the value object - the signature value object might be referenced from other locations in the PDF, too, so if you don't clear it, its information might remain accessible inside the PDF.
If your PDF contains multiple signatures or document timestamps, and if the signature you want to remove is not the newest one, manipulating it will break at least all newer signatures / time stamps. This is due to the way multiple signatures are applied to PDFs:
As you can recognize in this sketch, the bytes signed by newer signatures contain all older signatures.
In such a situation, therefore, don't only implement "remove a single target signature" but instead "remove all signature starting at a single target signature".
For some more technical backgrounds on integrated PDF signatures cf. this answer and documents referenced from there.

Verify Electronic Signatures with PDFBox

My company has requested a Java web service implementation of extracting data from PDF forms to initiate straight through processing capabilities for client operations using Apache PDFBox. Easy enough. The tough part is that forms are being submitted from clients of my firm on behalf of end customers, but the end customer signature has to be validated.
The business case for signing these forms is through informal electronic signature (digital representation of a wet signature) processes like the signature "stamp" in Adobe Reader with an image of the customer's signature, or touch screen drawing on an iPad. So far, I have been unable to consistently validate this type of signature, and even been unable to consistently maintain the PDF state such that it can still be read by PDFBox after this type of signature ceremony.
Validating signatures through the digital signature form field is trivial, and I have communicated that to our business. However, since the signer in those cases is typically the owner of the digital cert on whatever machine is being used and the assumption is that most of these interactions will take place in the client office.
I've got a few choices here:
Figure out how to identify electronic signatures consistently and reproduce the lossless signing ceremony for client education.
Make a change to the digital signature form field if possible to accept electronic signatures, if that's even possible.
I have a slight workaround using the most recent release of Acrobat in putting an image form field over the signature area, which works great except for one thing: all the software I've tried reads this form field type as a button. Is there any way to force it to recognize an image, or any PDF reading software that is more up to date and can detect those fields?
I would like to upload a couple sample PDFs, but of course they're all company proprietary information. Suffice it to say that we don't have any wizards doing amazing things with the forms... they are all your basic AcroForms and I'm trying to figure out how to configure the signature area.
Thank you.
Concerning your actual question:
I have a slight workaround using the most recent release of Acrobat in putting an image form field over the signature area, which works great except for one thing: all the software I've tried reads this form field type as a button. Is there any way to force it to recognize an image, or any PDF reading software that is more up to date and can detect those fields?
Any PDF reading software that recognizes those fields as a button is up-to-date, at least in that respect, because... there are no "image form fields" in the PDF file format!
Some PDF creators emulate image form fields using a button form field which by means of JavaScript gets the behavior of an image form field. This emulation is incomplete, of course. In particular the image in such a field is not the value of the form field but merely its appearance.
Thus, if you want to implement reading the value from such an emulated image form field, you have to extract the appearance of the button.
Some remarks on the whole scenario:
... the end customer signature has to be validated.
The business case for signing these forms is through informal electronic signature (digital representation of a wet signature) processes
In contrast to certificate based digital signatures you can hardly do anything with such signatures that deserves to be called "Validation".
Ok, you can look for an image in the PDF in some emulated image field, but you have no guarantee that the person whose wet signature can be seen on that image backs the data in that form let alone has indeed signed it personally. Just as likely someone else simply has scanned that person's signature from some different hand-signed document and filled the form using that scan...
So far, I have been unable to consistently validate this type of signature
It should be possible to extract most such wet signatures
either as bitmap images added directly or indirectly to the page content,
or as bitmap images added directly or indirectly to some annotation appearance (e.g. a button),
or as an InkList or Path of an Ink annotation,
or as the Vertices or Path of a PolyLine annotation.
In case of bitmap images don't forget to also extract the image mask if applicable. Numerous applications fill the base image with the pen color and contain the actual signature graph in the mask.
and even been unable to consistently maintain the PDF state such that it can still be read by PDFBox after this type of signature ceremony.
That sounds like a misbehavior of the software that executed that signing ceremony. Unless you share examples for that, though, one can hardly help you analyze the problem.

Understanding the PDF DOM

I am writing an application that has to read and interpret data stored in some PDF files. The reading part is done but I am only able to get a dump of all the words on a page and not the format of the words. What I mean is that if I have to extract a table, I am getting the numbers in the table but not the markup which defines the table.
Further, there is some formatting used which displays a few of these numbers within parentheses (meaning that those numbers are negative) but the parentheses themselves are not part of the text. Hence, I am not able to distinguish between positive and negative numbers present in the PDF table!
How do you get the PDF markup along with the text? Is a PDF similar in structure to an XML with tags used to markup tables etc.? If not, then, is there a resource which describes the salient features of the PDF DOM?
I am using VBA and the Acrobat library (AcroExch etc.)
There is no such thing as "PDF markup" in the sense of HTML etc. A table in PDF cannot be distinguished from line art, other than by using OCR, which can be error-prone if the layout is complex. It is simply drawn using geometrical shapes, like in a vector-based graphics program.
"Is a PDF similar in structure to an XML with tags used to markup tables etc.?"
No, not at all.
And there is no such thing as a 'DOM' either. Google for a file named *PDF32000_2008.pdf*. The current PDF specification for v1.7 (ISO spec) is that file. You should be able to locate it on the Adobe website.
As omz stated, text inside PDF does not really have a structure. You can take a look on the specification here. However, for some very specific files, there is something called PDF Tags, or PDF Marked Content, which is fairly new, and it aims to give PDF documents some kind of structure. If you target this kind of files specifically, you might be able to achieve something. Take a look on chapter 10 (Document Interchange) of the Adobe's specification for further details.
Maybe what you want to achieve can be done with less effort and faster by using TET, the Text Extraction Toolkit made by the fine folks from pdflib.com ( http://www.pdflib.com/products/tet/ ) ??
AFAIR, the TET has some (limited) support for table detection as well....