I've a huge amount of PDF files, and I need to determine programmatically which ones are fillable and which ones are not (because of the PDF security options). Is there any way to do that?
So far the best solution I've tried is a batch script based on pdftk, as suggested here: https://stackoverflow.com/a/4396189/112934. This way, I've discarded all the non password-protected files. But I've found some PDFs that, despite being password protected, are fillable.
Any ideas? I don't mind writing a small Java application if there is some easy to use API, but I'd prefer a batch script...
PS: just to clarify - what I need to determine is whether the "Filling form of fields" security option is set to "Allowed" or "Not allowed".
Related
I have a document pdf or docx (only accepted formats for multiturn), this contains alot of subheadings which translate to follow up prompts. This all works fine! But I would like to enable context-only for all of my prompts, because the answers are not relevant out of context.
Can I denote this in my document itself? There are way too many too manually check the button.
I could write a script that changes the contextOnly to true on the exported tsv, but this seems like it is a silly workaround.
There does not seem to be any way to indicate whether a question is context-only through the document extraction process, so you will need to automate this with a script. If you don't want to modify the TSV directly, you can use the QnA REST API. You can also access this API through the Bot Framework CLI but I don't know if that makes anything more convenient for you.
I am trying to make a dynamic PDF generator as an .NET Core API. I want to take an existing PDF, or .docx file, and edit it so it replaces the current name (John Doe) with something that can be replaced like #NAME_PLACEHOLDER.
I then want to transform #NAME_PLACEHOLDER -> John Doe (or whatever is in the KeyValuePair or Dictionary<string, string>).
I am running this on a Docker environment, so I can easily execute commands and I am willing to do that as well.
So far I have tried a few things:
1) pdf2htmlEX
Executes as pdf2htmlEX file.pdf
Does the job pretty well
Can be converted back to PDF using Google Chrome headless or similar
Problem: Only the characters used in the PDF can be used to replace. So if I only use A, B, C as characters, it will make D into Times New Roman (or default font)
2) LibreOffice ODT to PDF
This was pretty nice, because I could simply unzip the .odt file, open content.xml, search and replace, then save it as an .odt file again
Could be converted into PDF rather easily using soffice --convert-to pdf
LibreOffice is quite nice
Problem 1: Microsoft Word -> Save as ODT tends to break the formatting, so we have to use LibreOffice to go and change it back again
Problem 2: We don't want to move away from Microsoft's Office suite
3) HTML to PDF using Chrome Headless
What you see is what you get
By far the best option, if we're all developers aaand have unlimited time
Problem 1: Only our developers can make changes, since our marketing department do not know HTML
Problem 2: Our existing PDFs would have to be rewritten in HTML
As you can see, I have tried a bunch of things. None of them, except Chrome Headless, has lived up to my expectations. What I really like about #3 is what you see is what you get. I can make the whole thing in HTML, press CTRL+P and see what it looks like as a finished PDF, basically.
I am looking for a better solution, though. It can be paid. It can be free. All I need is to change out words/phrases with other words dynamically, which apparently seems like a tough thing to do.
Thanks for specifying what you've already found clearly. It helps a lot providing a succinct answer.
The conversion is always tricky - I'm sure you know Word has trouble displaying/editing some Word documents itself.
I have experience regarding point #2 "LibreOffice ODT to PDF" and can suggest a few things to test:
Don't use Microsoft to do the docx->odt conversion. It's not good as you know. Use LibreOffice itself to do this step. The rest of your process remains the same.
For some documents, Libre Office does doc->odt much better. So, you can instead work with DOC format and get a better result without any other changes.
You won't be able to remove the devs from the process, but you can certainly reduce their role allowing your business/marketing teams to have more direct input simply by:
get the starting point document to the devs to run through the conversion process. The devs can "clean up" the document to make it convert nicely.
make this version of the document the "official" starting point. The business or technical teams can load it, adjust it, and put it back into the process.
if possible, expose a test-platform to the business teams so they can download, adjust, upload and render to PDF. This cycle means they will be able to achieve more and if they're good, do impressive stuff without any dev input.
the above steps simply mean don't expect perfect conversion of arbitrary complex documents. Starting from a (even complex) working baseline is great.
Some of that might show you that your #2 is actually going to get the best overall results.
I hope that helps.
I want to make a script that locks a pdf file and make it non convertible to other types WORD or TXT.
Is there any script to make this possible ?
Thank you
You can't. Not really. I mean, there are ways but the effort expended is rarely worth it.
One of the simplest ways is to start with creating the document with an owner password and no user password. This lets anyone open the file, but will abide by the user access permissions in the encrypt dictionary. Permissions you can set can include:
Allow/Deny printing
Allow/Deny copying text/images
Allow/Deny modifying text annots and form fields
This will work with Acrobat, but doesn't stop any 3rd party tools from allowing this.
You could also make your own font that had an "unusual" encoding but reads correctly. This is equivalent to encoding your document with a Caesar cipher, which is not even encryption by any modern definition.
I have had the same question multiple times and researched different answers. I work as an engineer that writes documents sometimes for several subcontracted jobs and even sub it out to other technicians. They need my paperwork for those specific jobs but I don't want anyone copying my paperwork. I found converting a MS Word file to PDF can always be undone. I found the easiest way for me (though time consuming) was to print the document, then scan after printing as a pdf. That would make the text non-editable. Hope this helps.
We have a software, which creates user reports and saves them into pdf documents. We're using Ghostscript for this.
I'm aware that PDF is "normally" an export format which is not editable, but one of our customer needs the possibility (for legal reasons) to edit these files.
I thought it can be possible to save the text in fillable forms (like adobe acrobat offers) and save it that way. Is it possible to create Text within a fillable form in a PDF and save it (with free tools like Ghostscript), so that the user can edit it later?
I read the Ghostscript documentation, but I didn't find anything.
GhostScript isn't really a terrific tool for this. You'd be better off with a PDF generation library which can add the appropriate annotations to the page - if you're wedded to using annotations.
If the "content" must be edited by end users, using widget annotations is not a horribly bad way of doing things, except that every end user needs to have a copy of Acrobat and if only some people are allowed to edit, you will likely have to play with owner password protection and permissions in order prevent anyone from changing field contents.
As for free tools, depending on the usage you could use iText or iTextSharp.
If you are required to be able to take the content of the document and be able to make changes to it on the fly, that's a trickier beast. If you can afford it (and it's certainly not free), my company Atalasoft, publishes a product that I wrote that lets you build PDF documents from scratch or from templates and embed the .NET objects that create the content into the PDF itself, which means that you can read those objects back out and change the content with a site-specific application, for example.
I want to generate a technical report from lisp (AllegroCL in my case) and I studied various packages/project to help me do this.
Requirements:
Need to generate a PDF
May create an intermediate format like RTF, Restructured TEXT, HTML, Word DOC or Latex
Need to be flexible to be able to add content throughout my application
Need to handle Multi-Page, Headers, Footers, Tables, inclusion of Images.
Possibilities:
cl-pdf and cl-typesetting: I checked this one out and it works for now, but is there a better alternative?
Some Latex generator, but ???
Question:
Do you know alternatives to easily generate (PDF) reports from lisp. What is the best workflow to go for?
we are using cl-pdf and cl-typesetting for the last 3 years and it has numerous issues... (like its confusion around encodings, or silently not rendering things that don't fit, or...) so, i don't recommend new development based on them.
currently we are in the process of moving all our export mechanisms to open document format. openoffice is all happy with it, and there's a plugin for ms office, too.
there's .fodt, the so called flat open document text format, which is a mere xml file describing a document. generating it is as easy as generating xml files.
you can also make parts of your document read-only with a password (insert a section and mark it read-only and protected by a password. when generating the xml, you can generate random hashes as password...).