Does anyone know of a technology that allows one to edit the tags on pdfs? - pdf

I am looking to programmatically edit the tags in a pdf document.In particular I would like to be able to copy tags from one document to another, and edit them as I copy them over.
I have looked at coherent pdf, pythons pdfrw and pythons pdfedit and not been sucessful. I am creating the pdfs in Latex so any Latex based solution would be amazing, but i have not come up with anything that allows me to create tags).
Any advise?

Related

How to delete first page from muliple PDF's

I have a collection of PDF's that sometimes have a info page for the first page of the document that I want to remove.
If there a quick way to delete this info page from all of my pdf's or at least a way to show all pdf's that have more than one page so I can better find the ones that need to be fixed?
Do you know of any program that can do this? Or way to do this with python?
Note: The info page has text on it that that always remains the same "LAND TITLE OFFICE"
Using Windows 7 OS
Thanks
Some Research turned up the following:
http://www.python.org/workshops/2002-02/papers/17/index.htm
http://www.unixuser.org/~euske/python/pdfminer/index.html
https://pypi.org/project/pypdf/
You can try these two ways:
PdfTK is an utility to manipulate PDFs. Check this link, they are doing something similar to what you need (in the comments someone also posted a script for windows)
PDFsam is a graphical powerful tool to manipulate PDFs in bulk. The split+merge sections should do the trick.
Both of them are free, I'd suggest to study the first if you want to write a "recipe" that you can use often, but the later if you have to do it once.
You can use the opensource PDFBox as a command line utility to split PDF's.
The link for PDFBox is here: link
The documentation for splitting a PDF using PDFBox is here: link
You could use the PDFBox extract text functionality from a batch script and combine with grep to identify pages that contain the text you are looking for. The extract text documentation is here: link

Save out a new PDF with updates from users

In my iOS app, I would like to regenerate an existing pdf into another pdf after the users are done annotating on the existing pdf.
My regenerated pdf should be an exact replica of the existing pdf but should have embedded annotations and highlights etc which can be opened and viewed on desktops as well.
I have done some research on this including the solutions proposed on other SO posts. I have tried libharu etc.
But somehow I am not able to convert an existing pdf into a replica pdf. I am able to add annotations to a new pdf I create using libharu.
Now my problem is to copy the existing pdf as is to my regenerated pdf. Any pointers will be much helpful.
My understanding is that a library that can save back out a PDF with "true" annotations (those that can be hidden in Acrobat, for example) is not something that exists in a FOSS solution.
LibHaru, for example, only supports creating new PDFs, not editing or appending existing PDFs. From their homepage:
At this moment libHaru does not support reading and editing existing
PDF files and it's unlikely this support will ever appear.
You can render the PDF on a page by page basis, and then re-save it with some additional information. This S.O question has a reasonable looking piece of code. That will save any "annotations" more as an image in the PDF itself, though.
You might try a paid library like PDFNet.

Can I create personal bookmarks for pdf-Documents?

Is there a way to bookmark pages in a PDF document?
For instance, in PDF documentation of a well known commercial database I very often have to navigate my favorite pages. I'd like to bookmark these, so I can reach them with one click. The pre-generated bookmarks in these documents are not useable like personal bookmarks because they are way too many.
An obvious alternative is to just bookmark pages in the corresponding HTML documentation. That would be possible for the well known database's documentation but not every PDF is available in HTML or has a pleasant HTML rendering.
I have looked through the Acrobat Reader menus, SO and googled a bit for no result.
That depends on your pdf-reader.
On linux you could use 'okular' where you even can mark text or add comments in your documents. This information will be stored in a separate file.

Automatically generate buttons in PDF

I am looking for some creative ideas for making numbers clickable in PDF file. The PDF file is very large, and each page contains many numbers in the following format:
[00-00]
What is the best route to explore? Right now, the only idea that I have brainstormed is:
The PDF is created from Adobe InDesign. Perhaps "hooking" into InDesign before the PDF is created
I am looking to do this in a way that will be very automated, as there are a lot of numbers on a lot of pages.
Thanks!
There's various clever things you can do with grep styles in CS4 and CS5 - might be worth a go. There's also a lot of scripting you can do in InDesign using JavaScript - I'd start from there.

web based form to collect data and populate to a fillable PDF

Is there a script that anyone can suggest that would allow me to create a HTML or PHP web based form to collect data and save that data. the call the data to be populated in a fillable pdf?
If you have an existing PDF that you want to populate, and that PDF just has text fields (no checkboxes or radio buttons) then CAM::PDF may be able to help you. You can use it as a Perl library directly, or use its command-line interface. CAM::PDF is not useful for generating PDFs from scratch, however. Furthermore, if you have embedded fonts, then you need to ensure that all of the characters you plan to insert are represented in the embedded font.
Use a normal web page to get the data. If not sure how to do it, look for "php forms" on google, there are plenty of tutorials.
Then use a php pdf generator, like this one, to create the PDF file. If you look hard enough, you will probably find a pdf generator that will let you use a template with placeholders where the entered data should be.