Is there a way to read password protected PDFs with tabula-py? - pdf

I have password protected PDFs with some tables. (I have the passwords to them).
Currently I'm using PDFminer.six to extract data from these PDFs to text but I want to use tabula-py instead to extract tables.
Is there a way to do this?

After spending an extra few minutes on the documentation, I realised that read_pdf accepts a "password" argument....

Related

Merging write protected pdfs using itextsharp [duplicate]

I want to merge several PDF documents into one. The source documents can consist of PDFs created by me and others created by other organisations. I have no control over the permissions attached to documents not created by me. Some of these documents (those not created by me) may have permissions set. If a document requires a password to open it I do not attempt to merge it.
I am using iText 5.5.1 (I think that is the latest) to create a PDFCopy object to contain the resulting document and a reader for each source PDF in a loop (I am passing a list of the documents to be merged). I check each document for the number of pages and then using the PDFCopy object import each page and then add it to the PDFCopy object (the reason these two steps are separate is due to the intricacies of the language I am using to work with the java objects, RPG on an IBM iSeries). The problem is I can attach a reader to a PDF with permissions and get the page count, but as soon as I try to import a page into the copy object the program complains and terminates with the message 'PdfReader not opened with owner password'. I am not able to get the person(s) providing the documents from other organisations to not protect the documents (there a very, very good reasons why the original document is protected from change) but I need to consolidate these documents into one.
My question is, can I copy PDF's with permissions into a new document using iText and can I do it without knowing the owner password? In addition to that I guess the other question would be, is it legal?
Thanks
GarryM
Introduction: A PDF file can be encrypted using a public certificate. If you have such a PDF, you need the corresponding private certificate to decrypt it. A PDF file can be encrypted using two passwords: a user password and an owner password. If the PDF is encrypted using a user password, you need at least one of the two passwords to decrypt it.
Assumption: I assume that the PDFs are encrypted with nothing but an owner password. You can open these documents in a PDF viewer without having to provide a user password, which means the content can be accessed, but there are some restrictions in place depending on the permissions that are set.
Situation: iText is a library that allows you to access PDFs at a very low level, without a GUI. It can easily access a PDF that is encrypted with nothing but an owner password, but it can't check if you respect the permissions that are defined for the PDF. To make sure that you are aware of your responsibilities, an exception is thrown saying PdfReader not opened with owner password. This is often too strict: sometimes you have the permission to assemble a PDF file, but with iText it's all or nothing. Either you can open the file, or you can't. iText doesn't check what you're doing afterwards.
Solution: There is a static Boolean parameter called unethicalreading that is set to false by default. You can change it like this:
PdfReader.unethicalreading = true;
--EDIT (since iText 7):
pdfReader.setUnethicalReading(true);
From now on, it will be as if the PDFs aren't encrypted.
Is this legal? It's not that clear and I am not a lawyer, but:
It used to be illegal when Adobe still owned the copyright on the PDF specification. Adobe granted the right to use that copyright to any developer on certain conditions. One of these conditions was that you didn't "crack" a PDF. Removing the password from a PDF broke your "contract" with Adobe to use the PDF specification and you risked being sued.
This changed when Adobe donated the PDF specification to the community in order to make it an ISO standard. Now every one can use this international standard, and the above (risk of being sued by Adobe for infringing the copyright) no longer exists.
As the ISO standard documents the mechanism of encryption with an owner password and it is very easy to use the ISO standard to decrypt a document without having that password, the concept of introducing an owner password to enforce permissions is flawed from a technical point of view. It's merely a psychological way to prevent people to do something with your document that you, as an author, do not want.
It's like a stop sign on a deserted road. It says: you should stop here, but nobody/nothing is going to stop you if no one is around.
Suggested approach:
My approach is to decrypt the PDF using the unethicalreading parameter, and to look at the permissions that are set. If the permissions don't allow assembly, I refuse the document. I also set permissions on the resulting PDF where I try to find the combination of permissions that respect the permissions set on the original documents.
In some cases, it's not that hard: the people don't know the PDFs are often the owners of the documents who forgot the passwords that were used to encrypt them. In that case, simple permission of the owners of the documents is sufficient to decrypt them.
Final remark: I'm the original developer of iText and I'm responsible for introducing the unethicalreading parameter. I've chosen the name unethicalreading only to make sure people are aware of what they are doing. It doesn't mean that using that parameter is always unethical or illegal.

Add a password to PDF files generated by Phantomjs

I've got some server-side node.js code that generates PDF files on request, using phantomJS, and I'm looking for a way to add password protection to the output.
Sadly I haven't found any mention of such an option in phantom, which makes sense because Chrome doesn't provide that either. Alternately I could run some other tool that would take the PDF created by phantom and add password protection to it, but I can't seem to find any that can do exactly that (add a password to an existing file) and that's completely free to use (preferably, non-GPL).
Will be happy for suggestions on how to approach this task. Thanks!
You can use the node-qpdf package to encrypt and decrypt PDFs. It makes use of qpdf. So first you need to convert HTML -> PDF then PDF -> Password Protected PDF.

How to know PDF is readOnly or not using IText

I want to place watermark stamps on all PDFs which I currently have, but some are read-only. Is there any way I can know whether a file that I've opened is read-only or is not editable using IText?
There are two ways - one might just be file permissions (you did check those, right?) the other way is to see if there is an encryption object and no user password (in other words, the document is encrypted with no user password, but with an owner password). In this case the encryption dictionary will have a member called /P which is a bitfield of flags of allowable operations. Table 22 of the ISO PDF spec describes the meaning. Likely the 4th bit (1 << 3) is cleared, which means no modifications.

Using a wordlist to crack alphanumeric password

Let me first say that I'm doing nothing illegal. I'm doing this for learning purposes only. Using my own virtual network.
So I am trying to SSH into a server and say I know there is a user called urbasnlug so ssh urbanslug#ipadress but I need the root passoword.
I have a wordlist that contained only strings without alphanumeric strings. How would I use this wordlist to crack a password that has an alphanumeric password which is of mixed cases but the number in the password never goes past 100
Say the wordlist had the strings:
pass
word
How could I use these list to crack a password such as PaSSword99.
Maybe in ways other than with the use of word lists.
If you can't help me at least tell me why you can't.
I can write a C or Python module to do this but I know that there has to be something out there that already exists.
So you have two things to achieve here. The first is generating the set of passwords you wish to try. The second is throwing that list of passwords against your server.
The first problem is a classic use case of John The Ripper, you can have it read in your wordlist, apply some mangling rules (such as appending 0-99 to each word, permuting cases etc), and output a final, complete password list.
The second problem is quite easy to solve once you have the password list. You could just loop over the passwords in bash, but if you're really lazy, Metasploit has an SSH scanner that reads a password list for you.
Of course, breaking this down into two stages means you are storing the huge password list as a file. In general you would be more likely to pipe the output from John The Ripper to your SSH scanner, rather than using an intermediate file.
First off it will be difficult to get the root password if you are only logged in as a normal user. However, there are different ways of getting 'root' which I believe go beyond the scope of this forum.
Nonetheless, I don't get the correlation of where you wordlist comes to play if already know the characters present in the root password;which would mean you have the root password anyway.
Try and use Hashcat to try and retrieve password. You however need a wordlist eg rockyou.txt or any of those available in the OpenWall site (makers of John the Ripper, which is another tool which is only as good as your wordlist.
i think it will be easier (faster?) to get root via a local exploit, read /etc/shadows and crack that password

Make login information secure in Visual Studio

In my program, I have a simple login prompt so that only certain users may enter a program, as well as make the program function differently depending on the user. What I would like to do is have the information for the user login information (username, password, etc.) securely stored without going through an online database. I know that using a text file to store this information is a very bad idea, and I'm sure there is an easier way to do this than to make an array of this login information internally inside my program. Could you all give me some suggestions of a way to do this?
Hashes are what you need. Paste a hash-making function into your code, MD5 functions are available online for all major platforms. Then store your pairs of hashes in your config file. Devise a clever way to combine a password with your admittance options into another hash so that the file is edit-proof. This way, you can distribute the account configuration and if you don't make a trivial cryptographic mistake, it will work just as you want.
Example of the config file line (hashes truncated to 6 chars for clarity):
1a2b3c print;search;evaluate 4d5e6f
Here, 1a2b3c is obtained as MD5(username.Text+verysecret), the verbs are the account's rights and 4d5e6f is obtained as MD5(line[1]+verysecret+password.Text) where line[1] is the split result of the config line where the verbs are stored and the rest is the user's password.
Note how the password gets automatically salted by the verbs and how the verbs are protected against editing because that would invalidate the password hash. The verysecret constant is something hidden in your executable code that will prevent anybody from computing the hashes and unlocking the program.
Hashing is not an asymmetric cipher or key pair; a motivated attacker can crack your program to bypass protection altogether anyway, so going to further lengths is useless.
If you are cheap to find an asymmetric scheme, but cunning enough, you can change a few initialization constants in that MD5 function. This will make the cracking of your code harder, especially against the making of a counterfeit account file.
EDIT: When authenticating, don't just if(hashfromconfig == computedhash)... Script kiddies know how to hook into the string comparison function. Write if(MD5(hashfromconfig) == MD5(computedhash))... instead... Then the string comparison will work just as before, only it will not see your precious key hash that goes into a wannabe-counterfeit file. Ideally, have several versions of the MD5 function scattered across your code and named differently. Use if(foo(hashfromconfig) == bar(computedhash))... for a nice effect.
"without going through an online database." - do you mean on the client side?
"securely stored" and "client side" are pretty much mutually exclusive terms in this scenario.
There is absolutely no way to securely store data without touching online (server-side) source of some kind. If you are touching server-side source, it might as well be a DB.