I had some pdf multi signing workflow requirement. In this pdf will get signed multiple times without changes to the document say 2 or more people can sign same document.
I am trying to add the signatures in pdf twice but after signing pdf second time first signature get invalid. I have used PDFBox Java api for pdf signature creation.
PDF Creation steps:
Created pdf by adding empty signature fields names: suhasb#gmail.com and nikhil.courser#gmail.com using original hello.pdf out put file name hello_tag.pdf run program >TagPDFSignatureFields.java
First time signing by fetching signature field suhasb#gmail.com from hello_tag.pdf file, out file name is hello_signed.pdf run program >SignAndIdentifySignatureFields.java
Second time signing by fetching signature field nikhil.courser#gmail.com from hello_signed.pdf file, out file name is hello_singed2.pdf run program >Sign2.java
In 2nd step pdf gets signed properly but after 3rd step, 2nd step signed version gets invalided and 3rd step signature shows okay in acrobat reader.
Please find link Java source code and pdf sample for reference.
Google drive link pdf_multi_signs_pdfbox_java
Any help would be appreciated.
In short there are a number of issues in your code. The issue causing Adobe Reader to mark your first signature as invalid after adding a second signature actually already is in your preparation step TagPDFSignatureFields where you create an invalid duplicate pages tree entry. The other issues should also be fixed, even though Adobe Reader currently does not complain.
The issues in detail...
Duplicate Page Entry
In TagPDFSignatureFields your method addEmptySignField starts like this:
private void addEmptySignField(String[] args) throws Exception, IOException {
// Create a new document with an empty page.
try (PDDocument document = PDDocument.load(new File(args[0]));)
{
PDPage page = document.getPage(0);
document.addPage(page);
Here you retrieve the first page of document and immediately add that page to document again. This causes the pages root tree node in your file hello_tag.pdf to look like this:
2 0 obj
<<
/Type /Pages
/Count 2
/Kids [6 0 R 6 0 R]
>>
endobj
I.e. the pages tree contains the same page object twice which Adobe Reader does not accept but repairs under the hood. For the signed documents Adobe Reader warns about this in a vague way:
And in current versions (e.g. 2020.013.20066) Adobe Reader in the twice signed file even marks the first signature as broken. In earlier versions (e.g. 2019.012.20040) it did not do so. Probably this is an effect of the hardening of the validation code after the Shadow Attacks had been published.
As an aside: If you have a situation in which manipulating a signed document (form fill-ins, signing again, ...) breaks the old signatures, always also check whether the original document might already have issues. The check whether changes applied to a signed document are allowed, are quite sensitive to errors which otherwise are fixed under the hood and, therefore, not visible.
Invalid Partial Field Names
You use email addresses as field names, suhasd#gmail.com and nikhil.courser#gmail.com in case of your example:
signatureField.setPartialName("suhasd#gmail.com");
...
signatureField1.setPartialName("nikhil.courser#gmail.com");
(TagPDFSignatureFields method addEmptySignField)
These partial field names are invalid, partial field names must not contain period characters ('.').
PDFBox in future versions will try to prevent this, see PDFBOX-5028.
Setting the Default Resources And Default Appearances Upon Signing
During signing you set the default resources and default appearance of the AcroForm dictionary:
acroForm.setDefaultResources(resources);
...
acroForm.setDefaultAppearance(defaultAppearanceString);
(SignAndIdentifySignatureFields and Sign2 method addEmptySignField)
By itself this is not a bad thing but beware, if you do this to a previously signed file which already has such entries and you set them to different values than before, this can invalidate the former signature, see the issue answered here.
Setting PDF Version Without Need
You try to change the claimed PDF version of the document:
document.setVersion(1.0f);
(SignAndIdentifySignatureFields method addEmptySignField)
document.setVersion(2.0f);
(Sign2 method addEmptySignField)
The first instruction is ignored as the document itself already requires a version of at least 1.5, but the second instruction indeed sets the document PDF version to 2.0 which may cause issues with older viewers.
...
There quite likely are more issues. I merely first spotted these issues before I recognized that already fixing the only first one, the Duplicate Page Entry, sufficed to heal the first signature...
Related
string docuAddr = #"C:\Users\psimmon\source\repos\PDFTESTAPP\PDFTESTAPP\TempForms\forms-www.courts.state.co.us-Forms-PDF-JDF1117.pdf";
byte[] bytes = Encoding.Unicode.GetBytes(docuAddr);
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(bytes, true);---blows here
PdfLoadedForm myForm = loadedDocument.Form;
PdfLoadedFormFieldCollection fields = myForm.Fields;
not sure what I have done wrong here, but the PDF file is opening, either in a browser or a fileexployer window. so it has to be me, guessed at most of this, all you very smart folks, I could use your gray matter. forgive my stupidity.
The reported exception “could not find valid signature (%PDF-)” may occurs due to the file is not a PDF document. We suspect it seems the other format files are saved with the “.pdf” extension. We could not open and repair this type of document on our end, we have already added the details in our documentation,
Please find some of the following corrupted error messages that cannot be repaired:
UG: https://help.syncfusion.com/file-formats/pdf/open-and-save-pdf-file-in-c-sharp-vb-net#possible-error-messages-of-invalid-pdf-documents-while-loading
If you want to find this type of corrupted document, Syncfusion PDF Library provides support to check and report whether the existing PDF document is corrupted or not with corruption details and structure-level syntax errors.
UG: https://help.syncfusion.com/file-formats/pdf/working-with-document#find-corrupted-pdf-document
Blog: https://www.syncfusion.com/blogs/post/how-to-find-corrupted-pdf-files-in-c-sharp.aspx
KB: https://www.syncfusion.com/kb/9686/how-to-identify-the-corrupted-pdf-document-using-c-and-vb-net
I am working on an implementation where our system generates a PDF file for a user to download.
The key of our process and system is that this PDF file should not be modifiable by the user or program on the users computer (at least, not without bad intent) as the file can be uploaded to the system later on where we need to make sure the file is in it`s original state by comparing its hash value.
We thought we accomplished this by first disabling all permissions (CanModify,CanAssembleDocument, etc.) and then encrypting the document with an owner`s password. This prevented the modification of the file by all readers we had access to. It now turns out that one of our users modifies a PDF as soon as he opens the file in Acrobat Reader and 'save as' the doc to a new pdf file. We cannot reproduce this with the same reader version (2015.006.30497) but he can, every time.
The alternative of signing the PDF document is not an option for us, at least not with a PKI or any visible signature that users can see in their reader. If there is some sort of invisible signing option that that would be great but I don't know how.
Below the code that we use to lock the PDF. For testing purposes we disabled ALL permissions, to no avail. We`re using PDFBox 2.0.11.
Any sugestions what options there are to better lock the file for modification?
public static byte[] SealFile(byte[] pdfFile, String password) throws IOException
{ PDDocument doc =PDDocument.load(pdfFile);
ByteArrayOutputStream bos= new ByteArrayOutputStream();
byte[] returnvalue =null;
int keyLength = 256;
AccessPermission ap = new AccessPermission();
//Disable all
ap.setCanModifyAnnotations(false);
ap.setCanAssembleDocument(false); .
ap.setCanFillInForm(false);
ap.setCanModify(false);
ap.setCanExtractContent(false);
ap.setCanExtractForAccessibility(false);
ap.setCanPrint(false);
//The user password is empty ("") so user can read without password. The admin password is
// set to lock/encrypt the document.
StandardProtectionPolicy spp = new StandardProtectionPolicy(password, "", ap);
spp.setEncryptionKeyLength(keyLength);
spp.setPermissions(ap);
doc.protect(spp);
doc.save(bos);
doc.close();
bos.flush();
return bos.toByteArray();
}
This results in Adobe properties:
Edit (solution):==========
As suggested by #mkl, (all credits to this person) we were able to solve the problem with the use of the appendOnly flag, which is part of the AcroForm functionality. Turned out that the signatureExists flag was not required for our problem to be solved. (and after reading the specs, was not applicable)
Below is the solution we implemented:
/*
* This method is used to add the 'appendOnly flag' to the PDF document. This flag is part of
* the AcroForm functionality that instructs a PDF reader that the file is signed and should not be
* modified during the 'saved as' function. For full description see PDF specification PDF 32000-1:2008
* (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf)
* paragraph 12.7.2 Interactive Form Dictionary
*/
public static void addAcroFormSigFlags(PDDocument pdfDoc) {
PDDocumentCatalog catalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
if (acroForm == null) {
acroForm = new PDAcroForm(pdfDoc);
catalog.setAcroForm(acroForm);
}
// AppendOnly:
// If set, the document contains signatures that may be invalidated if the
// file is saved (wirtten) in a way that alters its previous contents, as
// opposed to an incremental update. Merely updating the file by appending
// new information to the end of the previous version is safe (see h.7,
// "Updating Example"). Conforming readers may use this flag to inform a
// user requesting a full save that signatures will be invalidated and
// require explicit confirmation before continuing with the operation
acroForm.setAppendOnly(true);
// SignatureExists: (Currently not used by us)
// If set, the document contains at least one signature field. This flag
// allows a conforming reader to enable user interface items (such as menu
// items or pushbuttons) related to signature processing without having to
// scan the entire document for the presence of signature fields.
// acroForm.setSignaturesExist(true);
// flag objects that changed (in case a 'saveIncremental' is done hereafter)
catalog.getCOSObject().setNeedToBeUpdated(true);
acroForm.getCOSObject().setNeedToBeUpdated(true);
}
Even if actually signing the PDF document is not an option for you, you can try and set the AcroForm flags that claim that a signature exists.
This should prevent programs that are sensitive to these flags (like Adobe Reader) from applying changes to the PDF, or at least they should apply their changes as incremental update which can be undone by truncating the file at its original size.
The flags entry in question is the SigFlags entry in the AcroForm dictionary.
Bit position —
Name —
Meaning
1 —
SignaturesExist —
If set, the document contains at least one signature field. This flag allows an interactive PDF processor to enable user interface items (such as menu items or push-buttons) related to signature processing without having to scan the entire document for the presence of signature fields.
2 —
AppendOnly —
If set, the document contains signatures that may be invalidated if the file is saved (written) in a way that alters its previous contents, as opposed to an incremental update. Merely updating the file by appending new information to the end of the previous version is safe (see H.7, "Updating example"). Interactive PDF processors may use this flag to inform a user requesting a full save that signatures will be invalidated and require explicit confirmation before continuing with the operation.
(ISO 32000-2, Table 225 — Signature flags)
Thus, you should set the SigFlags entry in the AcroForm dictionary in the Catalog to 3. You may have to create the AcroForm dictionary to start with if your PDF does not have a form definition yet
I'm reaching out to larger community of developers in seek of help to understand the real cause and possibly finding a fix. I have asked questions from Aspose, and they have also tracked the issue (PDFNET-42880) in their system. I think they are not going to investigate this anytime soon as it is very specific case. And now I am posting this here to ask more details about:
What is difference in Adobe 'save as' vs. Foxit Reader 'save as' vs. Windows Reader 'save as' feature?
Issues with Adobe product that are not so obvious to figure out. I don't even know what to ask :D
Link to their (Aspose) old forum: https://www.aspose.com/community/forums/thread/845549/removing-stamps-fails-after-saving-stamped-file-from-adobe-acrobat.aspx
Case:
Created PDF with forms using OpenOffice (version 3.4.0), stamped with Aspose PDF, opened with Adobe Reader DC (or Adobe Acrobat XI), filled, saved as new file. Now this new file is fine, but when I try to remove stamps using Aspose (and replace with new stamp later), this is where things get interesting.
Files that I've tested with: https://1drv.ms/f/s!Auvpijam7a73iDzOqc6wZPuY9l81
Stamp_Location.png
OoPdfFormExample_WithStamp.pdf
OoPdfFormExample_WithStamp_StampRemoved.pdf
OoPdfFormExample_WithStamp_SavedFromFoxit.pdf
OoPdfFormExample_WithStamp_SavedFromFoxit_StampRemoved.pdf
OoPdfFormExample_WithStamp_SavedFromWindowsReader.pdf
OoPdfFormExample_WithStamp_SavedFromWindowsReader_StampRemoved.pdf
OoPdfFormExample_WithStamp_SavedFromAdobeReader.pdf
OoPdfFormExample_WithStamp_SavedFromAcrobat_StampRemoved.pdf
C# code that is used to remove the stamp(s):
/// <summary>
/// Removes stamps from PDF file.
/// </summary>
/// <param name="pdfFile"></param>
private static void RemoveStamps( string pdfFile )
{
// Create PDF content editor.
Aspose.Pdf.Facades.PdfContentEditor contentEditor = new Aspose.Pdf.Facades.PdfContentEditor();
// Open the temp file.
contentEditor.BindPdf( pdfFile );
// Process all pages.
foreach ( Page page in contentEditor.Document.Pages )
{
// Get the stamp infos.
Aspose.Pdf.Facades.StampInfo[] stampInfos = contentEditor.GetStamps( page.Number );
//Process all stamp infos
foreach ( Aspose.Pdf.Facades.StampInfo stampInfo in stampInfos )
{
// Use try catch so we can output possible error w/out break point.
try
{
contentEditor.DeleteStampById( stampInfo.StampId );
}
catch ( Exception e )
{
Console.WriteLine( e );
}
}
}
// Save changes to the temp file.
contentEditor.Save( StampRemovedPdfFile );
}
Using Adobe: The process of removing stamp works fine, but trying to open the file will end up having an issue with the file.
"An error exists on this Page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem."
EDIT: After testing more, and just opening file to Aspose, and saving it without modifications, that didn't break the file, only once the stamp was removed with Aspose method it was broken.
Using Foxit: Only difference in the process is that opening the file to Foxit Reader and save form there. The stamp is removed and file is fine, works with any PDF reader.
Using Windows (10) Reader: Only difference in the process is that opening the file to Windows Reader and save from there. The stamp is removed and file is fine, works with any PDF reader.
Ok - The thing you are referring to is not a stamp annotation. It's an XObject that gets drawn into the page content. Why Aspose refers to it as a Stamp is... well... a mystery. When you remove the "stamp" (not a stamp) Aspose seems to be removing the XObject but not the instructions to draw it from the page Contents stream... that's why you're getting the error in Acrobat. The other applications are more permissive with bad PDF and my guess is when they write out the file, they are removing references to non-existent objects. You can make Acrobat attempt to fix problems like this by selecting Save As Optimized PDF. However, you are far better off removing the drawing instruction in addition to the XObject.
Because of the way you've created the file and added the "stamp", your page content stream is an array of streams. Remove the last item in the array, which is the instruction to draw the XObject, and you file will work without errors in all the viewers. Note: It won't always be the case that the last item in the content array will be your stamp. It's just that your stamp is the last thing to get drawn so it goes at the end.
If your intention is to "replace" the "stamp", you'll want to do so by removing the XObject as you are doing now, then remove the instruction, then add the new "stamp".
I am attempting to save documents to a mongoDB cluster (sharded replica sets) and am having a strange issue. I am using pymongo 2.7.2 and TokuMX 1.5 mongodb 2.4.10.
When I attempt to save (overwrite) existing documents I am getting an exception that looks like the document I am saving is too large:
doc = db.collection.find_one()
db.collection.save(doc)
pymongo.errors.OperationFailure: BSONObj size: 18798961 (0x71D91E01) is invalid. Size must be between 0 and 16793600(16MB) First element: op: "u"
However this works fine:
doc = db.collection.find_one()
db.collection.remove({'_id': doc['_id']})
db.collection.save(doc)
The document in question is about 9mb, so it looks like when I attempt to replace the document it is somehow doubling the size of the document, exceeding the 16mb limit.
Any ideas as to what could cause this behavior?
Apparently this is a known issue with TokuMX. Oplog entries are twice the size of the document, so replacing a 9mb document will result in a 18mb oplog entry- which raises the exception.
The solution would be to limit document writes to less than 8mb so that oplog entries never exceed 16mb.
I think this is a side effect of how save is implemented in PyMongo.
Under the hood if the document has a _id then the save(doc) is turned into an update(doc, doc). That is where the doubling is coming into play since the query+update is 18MB.
When you removed the _id you changed the save(doc) into a insert(doc) of a new document with a new _id. I don't think that is what you wanted.
Rather than use save I would recommend constructing a query with just the _id field from the original document and doing the update call manually. I would even go so far as you should enter a Jira ticket to get PyMongo to do this for you.
HTH,
Rob.
I've just write code that attaches files to PDF Document. I've seen the code in PDFBox page.
PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
PDComplexFileSpecification fs = new PDComplexFileSpecification();
fs.setFile( "Test.txt" );
InputStream is = ...;
PDEmbeddedFile ef = new PDEmbeddedFile(doc, is );
ef.setSubtype( "test/plain" );
ef.setSize( data.length );
ef.setCreationDate( new GregorianCalendar() );
fs.setEmbeddedFile( ef );
Map efMap = new HashMap();
efMap.put( "My first attachment", fs );
efTree.setNames( efMap );
PDDocumentNameDictionary names = new PDDocumentNameDictionary( doc.getDocumentCatalog() );
names.setEmbeddedFiles( efTree );
doc.getDocumentCatalog().setNames( names );
doc.save("attachedPDF");
that, works.
Then, I've attached files, and sign document. result is that -everything works!
Then, I get the signed document (which have attachments), and then sign the document with another attachment (I create revision 2. In the other words, I attach another files to signed document and sign again). The result was that, there was no old file. New file have overwrite old files (signature become invalid too , because of changing hash- that's correct);
So, I've done so that I get oldFiles from PDEmbeddedFilesNameTreeNode and add to new file map.
PDEmbeddedFilesNameTreeNode oldFiles=names.getEmbeddedFiles();
if(oldFiles!=null){
Map oldFilesMap = oldFiles.getNames();
Iterator iterator = oldFilesMap.entrySet().iterator();
while (iterator.hasNext()) {
Map.Entry mapEntry = (Map.Entry) iterator.next();
System.out.println("The key is: " + mapEntry.getKey()+ ",value is :" + mapEntry.getValue());
efMap.put(mapEntry.getKey(), mapEntry.getValue());
}
}
efTree.setNames(efMap);
that works. but signature is again invalid when I create second revision.
I think, The main problem is that, when I add new files to the same file NameDictionary, the hash of the document changes.
So, I think, I should create new NameDictionary in the next revision , may be I am wrong (I must not use existed NameDictionary). I dont understand. what can I do know? what do you think?
By the way, I think that is incorrect for me, for next revision
PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
that's my sample documents
Then, I get the signed document (which have attachments), and then sign the document with another attachment (I create revision 2. In the other words, I attach another files to signed document and sign again).
Whatever your other problems are you have trying this, this undertaking itself already is doomed. Even if you do that as an incremental update, this is not an allowed operation on a signed document.
The operations allowed on previously signed documents are either restricted by specification (in case of certification signatures) or by extrapolation from the certification rules (in case of approval signatures only).
In case of certification signatures (DocMDP signatures), the P value in the DocMDP transform parameters dictionary selects the set of operations allowed on the document:
(Optional) The access permissions granted for this document. Valid values shall be:
1 No changes to the document shall be permitted; any change to the document shall invalidate the signature.
2 Permitted changes shall be filling in forms, instantiating page templates, and signing; other changes shall invalidate the signature.
3 Permitted changes shall be the same as for 2, as well as annotation creation, deletion, and modification; other changes shall invalidate the signature.
Default value: 2.
(section 12.8.2.2.2 in ISO 32000-1)
As you see, attaching files is not among them.
Unfortunately the specification does not clearly say which changes shall be permitted if there is no certification signature (DocMDP signature); therefore, one might be tempted to assume everything is allowed.
Actually, though, current PDF viewers, especially the dominant Adobe Reader, assume differently and extrapolate a set of permitted changes. In case the Adobe Reader these are (cf. this answer for details) the same as for DocMDP with P = 3 plus adding signature fields. (It is assumed that the author did not really consider the signing use case and, therefore, likely forgot adding empty signature fields; otherwise, though, the set of allowed changes was considered apropos.)
Thus, no attaching of files either.
If you want to handle multiple attachments and multiple signatures, you may consider to supplement an already signed PDF by creating a new PDF, adding the original PDF and the new files as attachments (and setting the enw PDF to display the original PDF by default), and then sign the whole construct.
PS: Concerning your actual attempt: When trying to manipulate the already signed document DOC-signed.pdf, you seem to have started by reading and writing it using PDFBox; I assume this because DOC-signed.pdf is not a starting piece of DOC-signed-signed.pdf but that latter document indeed contains the new attachment and the second signature in an incremental update.
This caused the original file to be internally reorganized and the original signature to be broken in the process. You should instead start by creating an identical copy of the file and add the second signature as an incremental update.