I want to merge several PDF documents into one. The source documents can consist of PDFs created by me and others created by other organisations. I have no control over the permissions attached to documents not created by me. Some of these documents (those not created by me) may have permissions set. If a document requires a password to open it I do not attempt to merge it.
I am using iText 5.5.1 (I think that is the latest) to create a PDFCopy object to contain the resulting document and a reader for each source PDF in a loop (I am passing a list of the documents to be merged). I check each document for the number of pages and then using the PDFCopy object import each page and then add it to the PDFCopy object (the reason these two steps are separate is due to the intricacies of the language I am using to work with the java objects, RPG on an IBM iSeries). The problem is I can attach a reader to a PDF with permissions and get the page count, but as soon as I try to import a page into the copy object the program complains and terminates with the message 'PdfReader not opened with owner password'. I am not able to get the person(s) providing the documents from other organisations to not protect the documents (there a very, very good reasons why the original document is protected from change) but I need to consolidate these documents into one.
My question is, can I copy PDF's with permissions into a new document using iText and can I do it without knowing the owner password? In addition to that I guess the other question would be, is it legal?
Thanks
GarryM
Introduction: A PDF file can be encrypted using a public certificate. If you have such a PDF, you need the corresponding private certificate to decrypt it. A PDF file can be encrypted using two passwords: a user password and an owner password. If the PDF is encrypted using a user password, you need at least one of the two passwords to decrypt it.
Assumption: I assume that the PDFs are encrypted with nothing but an owner password. You can open these documents in a PDF viewer without having to provide a user password, which means the content can be accessed, but there are some restrictions in place depending on the permissions that are set.
Situation: iText is a library that allows you to access PDFs at a very low level, without a GUI. It can easily access a PDF that is encrypted with nothing but an owner password, but it can't check if you respect the permissions that are defined for the PDF. To make sure that you are aware of your responsibilities, an exception is thrown saying PdfReader not opened with owner password. This is often too strict: sometimes you have the permission to assemble a PDF file, but with iText it's all or nothing. Either you can open the file, or you can't. iText doesn't check what you're doing afterwards.
Solution: There is a static Boolean parameter called unethicalreading that is set to false by default. You can change it like this:
PdfReader.unethicalreading = true;
--EDIT (since iText 7):
pdfReader.setUnethicalReading(true);
From now on, it will be as if the PDFs aren't encrypted.
Is this legal? It's not that clear and I am not a lawyer, but:
It used to be illegal when Adobe still owned the copyright on the PDF specification. Adobe granted the right to use that copyright to any developer on certain conditions. One of these conditions was that you didn't "crack" a PDF. Removing the password from a PDF broke your "contract" with Adobe to use the PDF specification and you risked being sued.
This changed when Adobe donated the PDF specification to the community in order to make it an ISO standard. Now every one can use this international standard, and the above (risk of being sued by Adobe for infringing the copyright) no longer exists.
As the ISO standard documents the mechanism of encryption with an owner password and it is very easy to use the ISO standard to decrypt a document without having that password, the concept of introducing an owner password to enforce permissions is flawed from a technical point of view. It's merely a psychological way to prevent people to do something with your document that you, as an author, do not want.
It's like a stop sign on a deserted road. It says: you should stop here, but nobody/nothing is going to stop you if no one is around.
Suggested approach:
My approach is to decrypt the PDF using the unethicalreading parameter, and to look at the permissions that are set. If the permissions don't allow assembly, I refuse the document. I also set permissions on the resulting PDF where I try to find the combination of permissions that respect the permissions set on the original documents.
In some cases, it's not that hard: the people don't know the PDFs are often the owners of the documents who forgot the passwords that were used to encrypt them. In that case, simple permission of the owners of the documents is sufficient to decrypt them.
Final remark: I'm the original developer of iText and I'm responsible for introducing the unethicalreading parameter. I've chosen the name unethicalreading only to make sure people are aware of what they are doing. It doesn't mean that using that parameter is always unethical or illegal.
Related
I have a very strange problem and I am not sure where the issue is. I am creating a PDF and not setting any security restrictions or a password. When I open the PDF in Adobe Reader DC and get the properties,it does show the Security Method as No Security. However, the Document Assembly and Page Extraction are set to Not Allowed.
The PDF was created from a Word document and I simply did a save as PDF, no other options.
In General
Please be aware that the "Document Restrictions Summary" summarizes restrictions that arise from a number of factors, the following ones coming to my mind:
Restrictions applied in the course of encryption
When encrypting a PDF, permissions for a number of features can be restricted for a regular user. Thus, if the PDF is opened with the user password, these restrictions apply and are shown in the summary; if it is opened with the owner password, they don't apply.
These are the restrictions one usually thinks of when checking the document properties Security tab.
Restrictions applied in the course of signing (certification & approval)
When a PDF is digitally signed with an integrated signature, a number of features are automatically restricted, and some more features may be restricted depending on the MDP transforms and locks applied by the signatures. These restrictions also are shown in the summary.
Restrictions applied by the viewer software used
The viewer you use may restrict what you can do with a PDF, e.g. a number of features of the Acrobat Pro editions are not present in Adobe Reader or are present but by default disabled. These restrictions also appear in the summary.
These viewer related restrictions may even differ based on the kind of document you have. E.g. in Adobe Reader they differ between PDF documents carrying a XFA form definition and those that don't.
Restrictions changed by usage rights signatures (aka Reader Enabling)
There is a special kind of PDF signature (usage rights signatures) which can lift some restrictions caused by the viewer software. If a PDF contains such a valid usage rights signature, some usually disabled features of the viewer may be enabled, a fact which also reflects in the summary.
If a PDF contains a usage rights signature which has been invalidated, e.g. by disallowed changes to the document, not only those usually disabled features remain disabled but some more features may become disabled, which again shows in the summary.
There may be additional factors still...
In Your Case
The "Not Allowed" entries you see for your file in Adobe Reader DC are restrictions of the third type listed above, they are restrictions applied by the viewer software used. If you opened the file in a superior Acrobat edition, those entries would become "Allowed".
I started writing here:
PHP PDF password protection (no open without password)
But I can't add comments due to my reputation here (I'm better on AskUbuntu but I can't take my rep points from there). I also started a bounty there, and if someone will answer here in two days with an acceptable solution, I will award there.
Now, the problem: SetProtection method is not working as expected.
Wanted behaviour: create a protected/encrypted PDF document with TCPDF library so that the document view is always granted to everyone without asking any password, but if one tries to edit, a password is requested.
I use the following syntax:
$pdf->SetProtection(array('modify', 'copy', 'annot-forms', 'fill-forms', 'extract', 'assemble'), null, 'mypwd', 1);
I can open the file with a pdf viewer as expected.
If I try to open the file with Libreoffice Draw, the password is requested (as expected), but I'm able to edit the document BOTH with mypwd (expected) AND giving a blank password (NOT expected).
What is the right syntax, if any, to have pdf readable by everyone BUT editable ONLY with "mypwd" provided?
EDIT:
here you are with a file with a blank user password and a strong master password. Ilovepdf.com finds it UNLOCKED, Libreoffice Draw can edit it.
This is NOT the expected behaviour.
https://www.dropbox.com/s/864p8xjh1ue041z/tracking_12750_16.pdf?dl=0
As far as I can see your example PDF is encrypted just the way you wanted, with an empty user password and a non-empty owner password. Thus, TCPDF does just what it was asked to do.
Most likely the problem is that your expectation is too strong: If a program can open a PDF for reading, that program can do anything with the PDF, no matter how restricted it is configured to be. The permissions and different owner and user roles require the cooperation of the software in question, they are not technically enforced.
This already is clear from the specification:
Once the document has been opened and decrypted successfully, a PDF reader technically has access to the entire contents of the document. There is nothing inherent in PDF encryption that enforces the document permissions specified in the encryption dictionary. PDF readers shall respect the intent of the document creator by restricting user access to an encrypted PDF file according to the permissions contained in the file.
(ISO 32000-2, section 7.6.4 Standard security handler)
Apparently Libreoffice Draw simply does not behave as required by the PDF specification, i.e. it is not properly restricting user access to an encrypted PDF file according to the permissions contained in the file. Probably by design, probably just a programming glitch.
You should simply be aware that your expectation to
create a protected/encrypted PDF document with TCPDF library so that the document view is always granted to everyone without asking any password, but if one tries to edit, a password is requested.
cannot be implemented using standard PDF encryption facilities for arbitrary PDF processors, merely for those that follow the PDF specification requirement quoted above.
There are some providers of PDF DRM software solutions which are not so easy to circumvent, but I doubt any of them can withstand a determined hacker. (Unless the solution in question is not giving the PDF to the user at all but only images in a custom, webservice-based viewer; but this is not your use case.)
Depending on your actual requirements, you might want to investigate into using digital signatures instead of encryption; if your objective is to make sure that any recipient can be sure that he got your document contents and not what someone else edited into it, this appears more apropos.
I have a very strange problem and I am not sure where the issue is. I am creating a PDF and not setting any security restrictions or a password. When I open the PDF in Adobe Reader DC and get the properties,it does show the Security Method as No Security. However, the Document Assembly and Page Extraction are set to Not Allowed.
The PDF was created from a Word document and I simply did a save as PDF, no other options.
In General
Please be aware that the "Document Restrictions Summary" summarizes restrictions that arise from a number of factors, the following ones coming to my mind:
Restrictions applied in the course of encryption
When encrypting a PDF, permissions for a number of features can be restricted for a regular user. Thus, if the PDF is opened with the user password, these restrictions apply and are shown in the summary; if it is opened with the owner password, they don't apply.
These are the restrictions one usually thinks of when checking the document properties Security tab.
Restrictions applied in the course of signing (certification & approval)
When a PDF is digitally signed with an integrated signature, a number of features are automatically restricted, and some more features may be restricted depending on the MDP transforms and locks applied by the signatures. These restrictions also are shown in the summary.
Restrictions applied by the viewer software used
The viewer you use may restrict what you can do with a PDF, e.g. a number of features of the Acrobat Pro editions are not present in Adobe Reader or are present but by default disabled. These restrictions also appear in the summary.
These viewer related restrictions may even differ based on the kind of document you have. E.g. in Adobe Reader they differ between PDF documents carrying a XFA form definition and those that don't.
Restrictions changed by usage rights signatures (aka Reader Enabling)
There is a special kind of PDF signature (usage rights signatures) which can lift some restrictions caused by the viewer software. If a PDF contains such a valid usage rights signature, some usually disabled features of the viewer may be enabled, a fact which also reflects in the summary.
If a PDF contains a usage rights signature which has been invalidated, e.g. by disallowed changes to the document, not only those usually disabled features remain disabled but some more features may become disabled, which again shows in the summary.
There may be additional factors still...
In Your Case
The "Not Allowed" entries you see for your file in Adobe Reader DC are restrictions of the third type listed above, they are restrictions applied by the viewer software used. If you opened the file in a superior Acrobat edition, those entries would become "Allowed".
I want to merge several PDF documents into one. The source documents can consist of PDFs created by me and others created by other organisations. I have no control over the permissions attached to documents not created by me. Some of these documents (those not created by me) may have permissions set. If a document requires a password to open it I do not attempt to merge it.
I am using iText 5.5.1 (I think that is the latest) to create a PDFCopy object to contain the resulting document and a reader for each source PDF in a loop (I am passing a list of the documents to be merged). I check each document for the number of pages and then using the PDFCopy object import each page and then add it to the PDFCopy object (the reason these two steps are separate is due to the intricacies of the language I am using to work with the java objects, RPG on an IBM iSeries). The problem is I can attach a reader to a PDF with permissions and get the page count, but as soon as I try to import a page into the copy object the program complains and terminates with the message 'PdfReader not opened with owner password'. I am not able to get the person(s) providing the documents from other organisations to not protect the documents (there a very, very good reasons why the original document is protected from change) but I need to consolidate these documents into one.
My question is, can I copy PDF's with permissions into a new document using iText and can I do it without knowing the owner password? In addition to that I guess the other question would be, is it legal?
Thanks
GarryM
Introduction: A PDF file can be encrypted using a public certificate. If you have such a PDF, you need the corresponding private certificate to decrypt it. A PDF file can be encrypted using two passwords: a user password and an owner password. If the PDF is encrypted using a user password, you need at least one of the two passwords to decrypt it.
Assumption: I assume that the PDFs are encrypted with nothing but an owner password. You can open these documents in a PDF viewer without having to provide a user password, which means the content can be accessed, but there are some restrictions in place depending on the permissions that are set.
Situation: iText is a library that allows you to access PDFs at a very low level, without a GUI. It can easily access a PDF that is encrypted with nothing but an owner password, but it can't check if you respect the permissions that are defined for the PDF. To make sure that you are aware of your responsibilities, an exception is thrown saying PdfReader not opened with owner password. This is often too strict: sometimes you have the permission to assemble a PDF file, but with iText it's all or nothing. Either you can open the file, or you can't. iText doesn't check what you're doing afterwards.
Solution: There is a static Boolean parameter called unethicalreading that is set to false by default. You can change it like this:
PdfReader.unethicalreading = true;
--EDIT (since iText 7):
pdfReader.setUnethicalReading(true);
From now on, it will be as if the PDFs aren't encrypted.
Is this legal? It's not that clear and I am not a lawyer, but:
It used to be illegal when Adobe still owned the copyright on the PDF specification. Adobe granted the right to use that copyright to any developer on certain conditions. One of these conditions was that you didn't "crack" a PDF. Removing the password from a PDF broke your "contract" with Adobe to use the PDF specification and you risked being sued.
This changed when Adobe donated the PDF specification to the community in order to make it an ISO standard. Now every one can use this international standard, and the above (risk of being sued by Adobe for infringing the copyright) no longer exists.
As the ISO standard documents the mechanism of encryption with an owner password and it is very easy to use the ISO standard to decrypt a document without having that password, the concept of introducing an owner password to enforce permissions is flawed from a technical point of view. It's merely a psychological way to prevent people to do something with your document that you, as an author, do not want.
It's like a stop sign on a deserted road. It says: you should stop here, but nobody/nothing is going to stop you if no one is around.
Suggested approach:
My approach is to decrypt the PDF using the unethicalreading parameter, and to look at the permissions that are set. If the permissions don't allow assembly, I refuse the document. I also set permissions on the resulting PDF where I try to find the combination of permissions that respect the permissions set on the original documents.
In some cases, it's not that hard: the people don't know the PDFs are often the owners of the documents who forgot the passwords that were used to encrypt them. In that case, simple permission of the owners of the documents is sufficient to decrypt them.
Final remark: I'm the original developer of iText and I'm responsible for introducing the unethicalreading parameter. I've chosen the name unethicalreading only to make sure people are aware of what they are doing. It doesn't mean that using that parameter is always unethical or illegal.
I want to place watermark stamps on all PDFs which I currently have, but some are read-only. Is there any way I can know whether a file that I've opened is read-only or is not editable using IText?
There are two ways - one might just be file permissions (you did check those, right?) the other way is to see if there is an encryption object and no user password (in other words, the document is encrypted with no user password, but with an owner password). In this case the encryption dictionary will have a member called /P which is a bitfield of flags of allowable operations. Table 22 of the ISO PDF spec describes the meaning. Likely the 4th bit (1 << 3) is cleared, which means no modifications.