Adobe breaks stamped PDF when saving as new file / what is difference in Adobe 'save as' vs. Foxit Reader 'save as' feature - pdf

I'm reaching out to larger community of developers in seek of help to understand the real cause and possibly finding a fix. I have asked questions from Aspose, and they have also tracked the issue (PDFNET-42880) in their system. I think they are not going to investigate this anytime soon as it is very specific case. And now I am posting this here to ask more details about:
What is difference in Adobe 'save as' vs. Foxit Reader 'save as' vs. Windows Reader 'save as' feature?
Issues with Adobe product that are not so obvious to figure out. I don't even know what to ask :D
Link to their (Aspose) old forum: https://www.aspose.com/community/forums/thread/845549/removing-stamps-fails-after-saving-stamped-file-from-adobe-acrobat.aspx
Case:
Created PDF with forms using OpenOffice (version 3.4.0), stamped with Aspose PDF, opened with Adobe Reader DC (or Adobe Acrobat XI), filled, saved as new file. Now this new file is fine, but when I try to remove stamps using Aspose (and replace with new stamp later), this is where things get interesting.
Files that I've tested with: https://1drv.ms/f/s!Auvpijam7a73iDzOqc6wZPuY9l81
Stamp_Location.png
OoPdfFormExample_WithStamp.pdf
OoPdfFormExample_WithStamp_StampRemoved.pdf
OoPdfFormExample_WithStamp_SavedFromFoxit.pdf
OoPdfFormExample_WithStamp_SavedFromFoxit_StampRemoved.pdf
OoPdfFormExample_WithStamp_SavedFromWindowsReader.pdf
OoPdfFormExample_WithStamp_SavedFromWindowsReader_StampRemoved.pdf
OoPdfFormExample_WithStamp_SavedFromAdobeReader.pdf
OoPdfFormExample_WithStamp_SavedFromAcrobat_StampRemoved.pdf
C# code that is used to remove the stamp(s):
/// <summary>
/// Removes stamps from PDF file.
/// </summary>
/// <param name="pdfFile"></param>
private static void RemoveStamps( string pdfFile )
{
// Create PDF content editor.
Aspose.Pdf.Facades.PdfContentEditor contentEditor = new Aspose.Pdf.Facades.PdfContentEditor();
// Open the temp file.
contentEditor.BindPdf( pdfFile );
// Process all pages.
foreach ( Page page in contentEditor.Document.Pages )
{
// Get the stamp infos.
Aspose.Pdf.Facades.StampInfo[] stampInfos = contentEditor.GetStamps( page.Number );
//Process all stamp infos
foreach ( Aspose.Pdf.Facades.StampInfo stampInfo in stampInfos )
{
// Use try catch so we can output possible error w/out break point.
try
{
contentEditor.DeleteStampById( stampInfo.StampId );
}
catch ( Exception e )
{
Console.WriteLine( e );
}
}
}
// Save changes to the temp file.
contentEditor.Save( StampRemovedPdfFile );
}
Using Adobe: The process of removing stamp works fine, but trying to open the file will end up having an issue with the file.
"An error exists on this Page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem."
EDIT: After testing more, and just opening file to Aspose, and saving it without modifications, that didn't break the file, only once the stamp was removed with Aspose method it was broken.
Using Foxit: Only difference in the process is that opening the file to Foxit Reader and save form there. The stamp is removed and file is fine, works with any PDF reader.
Using Windows (10) Reader: Only difference in the process is that opening the file to Windows Reader and save from there. The stamp is removed and file is fine, works with any PDF reader.

Ok - The thing you are referring to is not a stamp annotation. It's an XObject that gets drawn into the page content. Why Aspose refers to it as a Stamp is... well... a mystery. When you remove the "stamp" (not a stamp) Aspose seems to be removing the XObject but not the instructions to draw it from the page Contents stream... that's why you're getting the error in Acrobat. The other applications are more permissive with bad PDF and my guess is when they write out the file, they are removing references to non-existent objects. You can make Acrobat attempt to fix problems like this by selecting Save As Optimized PDF. However, you are far better off removing the drawing instruction in addition to the XObject.
Because of the way you've created the file and added the "stamp", your page content stream is an array of streams. Remove the last item in the array, which is the instruction to draw the XObject, and you file will work without errors in all the viewers. Note: It won't always be the case that the last item in the content array will be your stamp. It's just that your stamp is the last thing to get drawn so it goes at the end.
If your intention is to "replace" the "stamp", you'll want to do so by removing the XObject as you are doing now, then remove the instruction, then add the new "stamp".

Related

syncfusion.pdf.pdfException"Could Not Find valid signature (%pds-).'

string docuAddr = #"C:\Users\psimmon\source\repos\PDFTESTAPP\PDFTESTAPP\TempForms\forms-www.courts.state.co.us-Forms-PDF-JDF1117.pdf";
byte[] bytes = Encoding.Unicode.GetBytes(docuAddr);
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(bytes, true);---blows here
PdfLoadedForm myForm = loadedDocument.Form;
PdfLoadedFormFieldCollection fields = myForm.Fields;
not sure what I have done wrong here, but the PDF file is opening, either in a browser or a fileexployer window. so it has to be me, guessed at most of this, all you very smart folks, I could use your gray matter. forgive my stupidity.
The reported exception “could not find valid signature (%PDF-)” may occurs due to the file is not a PDF document. We suspect it seems the other format files are saved with the “.pdf” extension. We could not open and repair this type of document on our end, we have already added the details in our documentation,
Please find some of the following corrupted error messages that cannot be repaired:
UG: https://help.syncfusion.com/file-formats/pdf/open-and-save-pdf-file-in-c-sharp-vb-net#possible-error-messages-of-invalid-pdf-documents-while-loading
If you want to find this type of corrupted document, Syncfusion PDF Library provides support to check and report whether the existing PDF document is corrupted or not with corruption details and structure-level syntax errors.
UG: https://help.syncfusion.com/file-formats/pdf/working-with-document#find-corrupted-pdf-document
Blog: https://www.syncfusion.com/blogs/post/how-to-find-corrupted-pdf-files-in-c-sharp.aspx
KB: https://www.syncfusion.com/kb/9686/how-to-identify-the-corrupted-pdf-document-using-c-and-vb-net

PDFBox multiple signature giving invalid signature Java

I had some pdf multi signing workflow requirement. In this pdf will get signed multiple times without changes to the document say 2 or more people can sign same document.
I am trying to add the signatures in pdf twice but after signing pdf second time first signature get invalid. I have used PDFBox Java api for pdf signature creation.
PDF Creation steps:
Created pdf by adding empty signature fields names: suhasb#gmail.com and nikhil.courser#gmail.com using original hello.pdf out put file name hello_tag.pdf run program >TagPDFSignatureFields.java
First time signing by fetching signature field suhasb#gmail.com from hello_tag.pdf file, out file name is hello_signed.pdf run program >SignAndIdentifySignatureFields.java
Second time signing by fetching signature field nikhil.courser#gmail.com from hello_signed.pdf file, out file name is hello_singed2.pdf run program >Sign2.java
In 2nd step pdf gets signed properly but after 3rd step, 2nd step signed version gets invalided and 3rd step signature shows okay in acrobat reader.
Please find link Java source code and pdf sample for reference.
Google drive link pdf_multi_signs_pdfbox_java
Any help would be appreciated.
In short there are a number of issues in your code. The issue causing Adobe Reader to mark your first signature as invalid after adding a second signature actually already is in your preparation step TagPDFSignatureFields where you create an invalid duplicate pages tree entry. The other issues should also be fixed, even though Adobe Reader currently does not complain.
The issues in detail...
Duplicate Page Entry
In TagPDFSignatureFields your method addEmptySignField starts like this:
private void addEmptySignField(String[] args) throws Exception, IOException {
// Create a new document with an empty page.
try (PDDocument document = PDDocument.load(new File(args[0]));)
{
PDPage page = document.getPage(0);
document.addPage(page);
Here you retrieve the first page of document and immediately add that page to document again. This causes the pages root tree node in your file hello_tag.pdf to look like this:
2 0 obj
<<
/Type /Pages
/Count 2
/Kids [6 0 R 6 0 R]
>>
endobj
I.e. the pages tree contains the same page object twice which Adobe Reader does not accept but repairs under the hood. For the signed documents Adobe Reader warns about this in a vague way:
And in current versions (e.g. 2020.013.20066) Adobe Reader in the twice signed file even marks the first signature as broken. In earlier versions (e.g. 2019.012.20040) it did not do so. Probably this is an effect of the hardening of the validation code after the Shadow Attacks had been published.
As an aside: If you have a situation in which manipulating a signed document (form fill-ins, signing again, ...) breaks the old signatures, always also check whether the original document might already have issues. The check whether changes applied to a signed document are allowed, are quite sensitive to errors which otherwise are fixed under the hood and, therefore, not visible.
Invalid Partial Field Names
You use email addresses as field names, suhasd#gmail.com and nikhil.courser#gmail.com in case of your example:
signatureField.setPartialName("suhasd#gmail.com");
...
signatureField1.setPartialName("nikhil.courser#gmail.com");
(TagPDFSignatureFields method addEmptySignField)
These partial field names are invalid, partial field names must not contain period characters ('.').
PDFBox in future versions will try to prevent this, see PDFBOX-5028.
Setting the Default Resources And Default Appearances Upon Signing
During signing you set the default resources and default appearance of the AcroForm dictionary:
acroForm.setDefaultResources(resources);
...
acroForm.setDefaultAppearance(defaultAppearanceString);
(SignAndIdentifySignatureFields and Sign2 method addEmptySignField)
By itself this is not a bad thing but beware, if you do this to a previously signed file which already has such entries and you set them to different values than before, this can invalidate the former signature, see the issue answered here.
Setting PDF Version Without Need
You try to change the claimed PDF version of the document:
document.setVersion(1.0f);
(SignAndIdentifySignatureFields method addEmptySignField)
document.setVersion(2.0f);
(Sign2 method addEmptySignField)
The first instruction is ignored as the document itself already requires a version of at least 1.5, but the second instruction indeed sets the document PDF version to 2.0 which may cause issues with older viewers.
...
There quite likely are more issues. I merely first spotted these issues before I recognized that already fixing the only first one, the Duplicate Page Entry, sufficed to heal the first signature...

PDFBox encrypted / locked PDF is still modified by Adobe Reader during 'save as'

I am working on an implementation where our system generates a PDF file for a user to download.
The key of our process and system is that this PDF file should not be modifiable by the user or program on the users computer (at least, not without bad intent) as the file can be uploaded to the system later on where we need to make sure the file is in it`s original state by comparing its hash value.
We thought we accomplished this by first disabling all permissions (CanModify,CanAssembleDocument, etc.) and then encrypting the document with an owner`s password. This prevented the modification of the file by all readers we had access to. It now turns out that one of our users modifies a PDF as soon as he opens the file in Acrobat Reader and 'save as' the doc to a new pdf file. We cannot reproduce this with the same reader version (2015.006.30497) but he can, every time.
The alternative of signing the PDF document is not an option for us, at least not with a PKI or any visible signature that users can see in their reader. If there is some sort of invisible signing option that that would be great but I don't know how.
Below the code that we use to lock the PDF. For testing purposes we disabled ALL permissions, to no avail. We`re using PDFBox 2.0.11.
Any sugestions what options there are to better lock the file for modification?
public static byte[] SealFile(byte[] pdfFile, String password) throws IOException
{ PDDocument doc =PDDocument.load(pdfFile);
ByteArrayOutputStream bos= new ByteArrayOutputStream();
byte[] returnvalue =null;
int keyLength = 256;
AccessPermission ap = new AccessPermission();
//Disable all
ap.setCanModifyAnnotations(false);
ap.setCanAssembleDocument(false); .
ap.setCanFillInForm(false);
ap.setCanModify(false);
ap.setCanExtractContent(false);
ap.setCanExtractForAccessibility(false);
ap.setCanPrint(false);
//The user password is empty ("") so user can read without password. The admin password is
// set to lock/encrypt the document.
StandardProtectionPolicy spp = new StandardProtectionPolicy(password, "", ap);
spp.setEncryptionKeyLength(keyLength);
spp.setPermissions(ap);
doc.protect(spp);
doc.save(bos);
doc.close();
bos.flush();
return bos.toByteArray();
}
This results in Adobe properties:
Edit (solution):==========
As suggested by #mkl, (all credits to this person) we were able to solve the problem with the use of the appendOnly flag, which is part of the AcroForm functionality. Turned out that the signatureExists flag was not required for our problem to be solved. (and after reading the specs, was not applicable)
Below is the solution we implemented:
/*
* This method is used to add the 'appendOnly flag' to the PDF document. This flag is part of
* the AcroForm functionality that instructs a PDF reader that the file is signed and should not be
* modified during the 'saved as' function. For full description see PDF specification PDF 32000-1:2008
* (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf)
* paragraph 12.7.2 Interactive Form Dictionary
*/
public static void addAcroFormSigFlags(PDDocument pdfDoc) {
PDDocumentCatalog catalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
if (acroForm == null) {
acroForm = new PDAcroForm(pdfDoc);
catalog.setAcroForm(acroForm);
}
// AppendOnly:
// If set, the document contains signatures that may be invalidated if the
// file is saved (wirtten) in a way that alters its previous contents, as
// opposed to an incremental update. Merely updating the file by appending
// new information to the end of the previous version is safe (see h.7,
// "Updating Example"). Conforming readers may use this flag to inform a
// user requesting a full save that signatures will be invalidated and
// require explicit confirmation before continuing with the operation
acroForm.setAppendOnly(true);
// SignatureExists: (Currently not used by us)
// If set, the document contains at least one signature field. This flag
// allows a conforming reader to enable user interface items (such as menu
// items or pushbuttons) related to signature processing without having to
// scan the entire document for the presence of signature fields.
// acroForm.setSignaturesExist(true);
// flag objects that changed (in case a 'saveIncremental' is done hereafter)
catalog.getCOSObject().setNeedToBeUpdated(true);
acroForm.getCOSObject().setNeedToBeUpdated(true);
}
Even if actually signing the PDF document is not an option for you, you can try and set the AcroForm flags that claim that a signature exists.
This should prevent programs that are sensitive to these flags (like Adobe Reader) from applying changes to the PDF, or at least they should apply their changes as incremental update which can be undone by truncating the file at its original size.
The flags entry in question is the SigFlags entry in the AcroForm dictionary.
Bit position —
Name —
Meaning
1 —
SignaturesExist —
If set, the document contains at least one signature field. This flag allows an interactive PDF processor to enable user interface items (such as menu items or push-buttons) related to signature processing without having to scan the entire document for the presence of signature fields.
2 —
AppendOnly —
If set, the document contains signatures that may be invalidated if the file is saved (written) in a way that alters its previous contents, as opposed to an incremental update. Merely updating the file by appending new information to the end of the previous version is safe (see H.7, "Updating example"). Interactive PDF processors may use this flag to inform a user requesting a full save that signatures will be invalidated and require explicit confirmation before continuing with the operation.
(ISO 32000-2, Table 225 — Signature flags)
Thus, you should set the SigFlags entry in the AcroForm dictionary in the Catalog to 3. You may have to create the AcroForm dictionary to start with if your PDF does not have a form definition yet

Save an image present in PDF on local File System

This is my first experience of using PDFBox jar files. Also, I have recently started working on TestComplete. In short, all these things are new for me and I have been stuck on one issue for last few hours. I will try to explain as much as I can. Would really appreciate any help!
Objective:
To save an image present in a PDF file on the file system
Issue:
When this line gets executed objImage.write2file_2(strSavePath);, I get the error Object doesn't support this property or method.
I am taking some help from here
Code:
function fn_PDFImage()
{
var objPdfFile, strPdfFilePath, strSavePath, objPages, objPage, objImages, objImage, imgbuffer;
strPdfFilePath = "C:\\Users\\aabb\\Desktop\\name.pdf";
strSavePath = "C:\\Users\\aabb\\Desktop\\abc";
objPdfFile = JavaClasses.org_apache_pdfbox_pdmodel.PDDocument.load_3(strPdfFilePath);
objPages = objPdfFile.getDocumentCatalog().getAllPages();
//getting a page with index=1
objPage = objPages.get(1)
objImages = objPage.getResources().getXObjects().values().toArray();
Log.Message(objImages.length); //This is returning 14. i.e, 14 images
//getting an image with index=1
objImage = objImages.items(1);
Log.Message(typeof objImage); //returns "Object" which means it is not null
//saving the image
objImage.write2file_2(strSavePath); //<---GETTING AN ERROR HERE
}
ERROR:
If you are bothered about the method namewrite2file_2, please read this excerpt from the link which I have shared:
In Java, the constructor of a class has the name of this class.
TestComplete changes the constructor names to newInstance(). If a
class has overloaded constructors, TestComplete names them like
newInstance, newInstace_2, newInstance_3 and so on.
Additional Info:
I have imported Jar file(pdfbox-app-1.8.13.jar) and their classes in testcomplete. I am not sure if I need to import some other jar file or its class here:
XObjects are not always image XObjects. And write2file is in the class PDXObjectImage so you need to check your object type first.
Re the second question asked in the comment: the form XObject isn't something you can save. XObject forms are content streams with resources etc, similar to pages. However what you can do is to explore these too whether the resources have images. See how this is done in the ExtractImages source code of PDFBox 1.8.
However there are other places where there can be images (e.g. patterns, soft masks, inline images); this is only available in PDFBox 2.*, see the ExtractImages source code there. (Note that the class names are different).

GhostScript .NET not continuing past certain pages

I've created a program which needs to convert PDF files into image files, and for this GhostScript is the best choice. But once in a while, the library stalls completely on a page and doesn't continue, it just keeps using CPU power and working, as though it might be caught in an infinite loop. The error is easily reproduce-able as it happens every time on the specific PDF files that it occurs on, though no error is given from GhostScript of any kind, and nothing is out of the ordinary in the PDF files themselves as far as I can see.
I have however been able to find out that the stalling is due to a specific element or elements in the pdf files, and by deleting the elements the pdf will easily render in GhostScript, but this is not a solution, nor an answer I can use.
PDF link* - http://www.filedropper.com/usjunis1-32webtest
*saved with free version of PDF-XChange Editor, so it has watermarks at the top, but it is the square that creates the stalling. I've also seen it happen on vector graphics objects, so it is not limited to squares.
Code -
private void startImageProcessing(String pdfFile)
{
GhostscriptVersionInfo gvi = new GhostscriptVersionInfo(new Version(0, 0, 0), Directory.GetCurrentDirectory() + #"\gsdll32.dll", string.Empty, GhostscriptLicense.GPL);
Ghostscript.NET.Processor.GhostscriptProcessor processor = new Ghostscript.NET.Processor.GhostscriptProcessor(gvi, true);
processor.StartProcessing(CreateTestArgs(pdfFile, pdfFile.Substring(0, pdfFile.Length - 4) + "\\"+prefix+"-%03d.jpg", 72 * scale), new ConsoleStdIO(true));
}
private static string[] CreateTestArgs(string inputPath, string outputPath, int dpi)
{
List<string> gsArgs = new List<string>();
gsArgs.Add("-dSAFER");
gsArgs.Add("-dBATCH");
gsArgs.Add("-dNOPAUSE");
gsArgs.Add("-sDEVICE=jpeg");
gsArgs.Add("-r" + dpi);
gsArgs.Add("-dJPEGQ=100");
gsArgs.Add("-dNumRenderingThreads=" + Environment.ProcessorCount.ToString());
gsArgs.Add("-dTextAlphaBits=4");
gsArgs.Add("-dGraphicsAlphaBits=4");
gsArgs.Add(#"-sOutputFile=" + outputPath);
gsArgs.Add(#"-f" + inputPath);
return gsArgs.ToArray();
}
I've also created a pdf file only containing one of the wrong elements for testing, and it has both had the error when saved by Adobe Acrobat, and PDF-XChange Editor, so the error is not due to a specific program that I've used to save the PDF either.