I need to create a database of different PDF files which are either uploaded by users on the server or are saved as bookmarks for the pdf files available on internet. The files available through internet are opened in pdf.js. I came across the the fingerprint that pdfjs generates for some of its operations and was wondering if I could use that to identify the pdf uniquely. But to do that I also need to generate this fingerprint myself for the documents that are uploaded but not opened via viewer.js (since I can get my hands on this fingerprint via viewer.js but not otherwise). I can use iTextSharp as pdf parser for pdf parsing but have no clue how pdfjs generates the fingerprint.
It seems pdf.js is doing the following in its fingerprint():
If available, it uses the first ID string from the PDF trailer.
If there's no ID, an MD5 hash of (part of) the byte content is calculated.
That's my quick interpretation of the current pdf.js fingerprint() source
Related
I have a banking client for whom I have designed an iOS app where we will populate all the client details onto the account opening application pdf forms and generate the final pdf with all the client details. I am generating a pdf using CoreGraphics. But the pdf is editable in Adobe Acrobat Pro and they are able to edit the contents of the application form. Is there any method to restrict the editing of the pdf after it is generated from CoreGraphics? I have encrypted the pdf with a password But the client needs the pdf to be non editable.
See Protecting PDF Content
Reading between the lines a bit — because the docs are not overly clear — I think that when you create the PDF context using CGPDFContextCreate(), you pass a dictionary into its auxiliaryInfo, using the key kCGPDFContextOwnerPassword and a value that's some arbitrary password string. This encrypts the document so that only the owner (there people with that password) can work with the contents. It doesn't say it prevents editing explicitly, but I'm guessing that's implied because it list out special keys to block printing and copying (preventing editing seems like the thing one would always want when encrypting a pdf).
We have some code which is generating a data filled PDF (fdf) file from an excel spreadsheet which is then being sent to docusign in our test environment.
Some of these work, and some come back with an error "PDF_VALIDATION_FAILED".
We have narrowed it down to the PDF document itself, and have watered down the original template to contain just four fields. We have watered down our excel spreadsheet to four basic fields using (for example) "a,1,a,2" for one input and "aa,1,a,2" as another, however one will consistently work and one will consistently fail.
Viewing the generated PDF's in a local PDF viewer (Adobe and PDF XChange Editor) the document appears fine, viewing the documents side by side in a hex/diff editor (WinMerge) shows minor differences in the streams being sent (as expected).
Is there any documentation on what validation is being performed on the PDF so we can emulate this locally and make sure our PDF's are valid before sending to the docusign API?
Thanks
Template
I am able to successfully create an envelope with the Documents you have provided.
See here for the complete CreateEnvelope request that I have used
I have used these documents that you have provided
Working PDF
Non Working PDF
I want to merge several PDF documents into one. The source documents can consist of PDFs created by me and others created by other organisations. I have no control over the permissions attached to documents not created by me. Some of these documents (those not created by me) may have permissions set. If a document requires a password to open it I do not attempt to merge it.
I am using iText 5.5.1 (I think that is the latest) to create a PDFCopy object to contain the resulting document and a reader for each source PDF in a loop (I am passing a list of the documents to be merged). I check each document for the number of pages and then using the PDFCopy object import each page and then add it to the PDFCopy object (the reason these two steps are separate is due to the intricacies of the language I am using to work with the java objects, RPG on an IBM iSeries). The problem is I can attach a reader to a PDF with permissions and get the page count, but as soon as I try to import a page into the copy object the program complains and terminates with the message 'PdfReader not opened with owner password'. I am not able to get the person(s) providing the documents from other organisations to not protect the documents (there a very, very good reasons why the original document is protected from change) but I need to consolidate these documents into one.
My question is, can I copy PDF's with permissions into a new document using iText and can I do it without knowing the owner password? In addition to that I guess the other question would be, is it legal?
Thanks
GarryM
Introduction: A PDF file can be encrypted using a public certificate. If you have such a PDF, you need the corresponding private certificate to decrypt it. A PDF file can be encrypted using two passwords: a user password and an owner password. If the PDF is encrypted using a user password, you need at least one of the two passwords to decrypt it.
Assumption: I assume that the PDFs are encrypted with nothing but an owner password. You can open these documents in a PDF viewer without having to provide a user password, which means the content can be accessed, but there are some restrictions in place depending on the permissions that are set.
Situation: iText is a library that allows you to access PDFs at a very low level, without a GUI. It can easily access a PDF that is encrypted with nothing but an owner password, but it can't check if you respect the permissions that are defined for the PDF. To make sure that you are aware of your responsibilities, an exception is thrown saying PdfReader not opened with owner password. This is often too strict: sometimes you have the permission to assemble a PDF file, but with iText it's all or nothing. Either you can open the file, or you can't. iText doesn't check what you're doing afterwards.
Solution: There is a static Boolean parameter called unethicalreading that is set to false by default. You can change it like this:
PdfReader.unethicalreading = true;
--EDIT (since iText 7):
pdfReader.setUnethicalReading(true);
From now on, it will be as if the PDFs aren't encrypted.
Is this legal? It's not that clear and I am not a lawyer, but:
It used to be illegal when Adobe still owned the copyright on the PDF specification. Adobe granted the right to use that copyright to any developer on certain conditions. One of these conditions was that you didn't "crack" a PDF. Removing the password from a PDF broke your "contract" with Adobe to use the PDF specification and you risked being sued.
This changed when Adobe donated the PDF specification to the community in order to make it an ISO standard. Now every one can use this international standard, and the above (risk of being sued by Adobe for infringing the copyright) no longer exists.
As the ISO standard documents the mechanism of encryption with an owner password and it is very easy to use the ISO standard to decrypt a document without having that password, the concept of introducing an owner password to enforce permissions is flawed from a technical point of view. It's merely a psychological way to prevent people to do something with your document that you, as an author, do not want.
It's like a stop sign on a deserted road. It says: you should stop here, but nobody/nothing is going to stop you if no one is around.
Suggested approach:
My approach is to decrypt the PDF using the unethicalreading parameter, and to look at the permissions that are set. If the permissions don't allow assembly, I refuse the document. I also set permissions on the resulting PDF where I try to find the combination of permissions that respect the permissions set on the original documents.
In some cases, it's not that hard: the people don't know the PDFs are often the owners of the documents who forgot the passwords that were used to encrypt them. In that case, simple permission of the owners of the documents is sufficient to decrypt them.
Final remark: I'm the original developer of iText and I'm responsible for introducing the unethicalreading parameter. I've chosen the name unethicalreading only to make sure people are aware of what they are doing. It doesn't mean that using that parameter is always unethical or illegal.
I am developing an archiving system that stores documents in a database and provides various functionalities to the user. I have added a part to sign and verify any document in the database. However, I am stuck with the logic and wondering where should I place the signing function.
Hints about my aims:
No document should be uploaded on the database without a signatureIf a document is not changed it should retain its signatureIf the document does not own a signature .. it should be signed with uploader's signatureThe signature will not encrypt the file so it will still be readable after the signing process is applied
The initial solution I have used is to place the signing procedure in in the form that is called by the Upload button and store the signature of the file in a separate column in the Documents table in the database. However, that solution turned out to be invalid for my scenario as if an employee downloaded a file and then uploaded it again, then it will be signed by him and thus, the original signature will be lost. Also, the signature will be with no significance out side the system.
My main question:
Is there a way to store the signature inside the documents?
Hint: My system will deal only with PDF, JPEG, Tiff, MS Office and TXT Documents.
Subsidiary Request: It would be awesome if there's a way to store the signature in any type of files!
Is there a way to store the signature inside the documents?
A digital signature must be built using a hash of the document that is being signed. Since adding a signature to a document modifies the document (which invalidates the hash), there is no general solution to storing a digital signature inside a document.
Some document formats allow for digital signing and define what portion is to be excluded from the hash, but those formats that were listed—as far as I know—are not among them. (Though PGP could be used on TXT documents.)
Since signatures sign the hash of a document, you could simply create a table mapping hashes to signatures. Thus, downloading and re-uploading a document will not remove existing signatures, since the hash will remain the same. The usefulness of this approach depends, of course, on the semantic meaning of a signature in your system.
I need to make an incremental update (add some existing pdf pages) to an signed pdf, making the included signature still be valid (that cover the first page).
I've seen some post's telling that is possible with PDFStamper (iTextSharp), but I'm unable to find a example out to make it append.
Changing an already signed PDF would sound imply a security leak in the PDF signing functionality/spec. The purpose of signing a PDF is a guarantee to the reader that it has not been altered by anyone other than the original author.
I think your only option is to send extra pages in a seperate PDF, or change the original PDF and have it re-signed.