pdf to searchable ps - pdf

I know people have asked similar question but couldn't find an answer to this. I have a pdf file that was produced using pdflatex. It is searchable (you can press ctrl+f and search for words inside) and it uses hyperref for the citations. I want to make a ps file out of it.
I tried pdf2ps from gs and pdftops from the poppler package. Both make the document like it is a picture. You cannot search anything inside and also the hyperrefs don't work.
Any way I can make a ps file but at least keep it searchable?
Thanks in advance!

Why do you want to 'search inside' a PostScript file ? A PostScript file is for printing.
What do you mean by 'hyperrefs don't work', what do you expect them to do ?

Consider this tool: renderpdf
I don't know about citations, but it makes searchable postscript files

Related

Automatically replace the PlantUML code in an .adoc file with the corresponding PNG images

I have a big .adoc file with plantUML Diagrams in it. The main goal is to convert the .adoc from asciidoc to markdown. For that it is necessary to replace the .puml diagrams to images (png). Its possible to extract the .PNGs from a .adoc file, but so far I did not figure out if there maybe is a tool, library etc. that could replace the .puml diagrams with the corresponding PNG version of it.
Does someone a solution for this? I guess the really last approach would be to write a bashscript that does that automatically (sed commands), but I would love not to do that.
Would be grateful for Answers, have a nice day!
If you convert the PlantUML to PNG, you cannot maintain the diagrams anymore. I assume you want to keep the documents up-to-date and use Markdown as the source of truth.
You can keep PlantUML inside Markdown, similarly to what you do in AsciiDoc. The only extra that you have to use a preprocessor:
https://github.com/verhas/jamal
If you use IntelliJ, you can edit the Markdown text with macros, including the PlantUML text, and see the formatted text with the diagram in WYSIWYG on the right pane. You will have the XXX.md.jam and the XXX.md automatically saved simultaneously using the Asciidoctor plugin and the Jamal preprocessor with it.
I created Jamal. It is open-source. If you need any more help, you can reach me.
It may not be the answer you are looking for, but may also be.

Does anyone know of a technology that allows one to edit the tags on pdfs?

I am looking to programmatically edit the tags in a pdf document.In particular I would like to be able to copy tags from one document to another, and edit them as I copy them over.
I have looked at coherent pdf, pythons pdfrw and pythons pdfedit and not been sucessful. I am creating the pdfs in Latex so any Latex based solution would be amazing, but i have not come up with anything that allows me to create tags).
Any advise?

How to delete first page from muliple PDF's

I have a collection of PDF's that sometimes have a info page for the first page of the document that I want to remove.
If there a quick way to delete this info page from all of my pdf's or at least a way to show all pdf's that have more than one page so I can better find the ones that need to be fixed?
Do you know of any program that can do this? Or way to do this with python?
Note: The info page has text on it that that always remains the same "LAND TITLE OFFICE"
Using Windows 7 OS
Thanks
Some Research turned up the following:
http://www.python.org/workshops/2002-02/papers/17/index.htm
http://www.unixuser.org/~euske/python/pdfminer/index.html
https://pypi.org/project/pypdf/
You can try these two ways:
PdfTK is an utility to manipulate PDFs. Check this link, they are doing something similar to what you need (in the comments someone also posted a script for windows)
PDFsam is a graphical powerful tool to manipulate PDFs in bulk. The split+merge sections should do the trick.
Both of them are free, I'd suggest to study the first if you want to write a "recipe" that you can use often, but the later if you have to do it once.
You can use the opensource PDFBox as a command line utility to split PDF's.
The link for PDFBox is here: link
The documentation for splitting a PDF using PDFBox is here: link
You could use the PDFBox extract text functionality from a batch script and combine with grep to identify pages that contain the text you are looking for. The extract text documentation is here: link

open source pdf editor based on mupdf

This is not a program code question.
I would like to know if you know an open source pdf editor based on mupdf.
In fact, I just need the following features:
Highlight a select rectangle. (And the feature delete a highlighted rectangle)
Add a line in some text (to indicate these text should be deleted or ignored). (And the feature to
delete this line)
Rotate a page or an entire pdf file.
Add a comment (annotation).
Thank you in advance.
I don't think there is a useful pdf editor based on mupdf
(1)Editing a PDF with MuPDF And (2)MuPDF wikipedia
though there are other pdf editors which are free softwares.
PDFedit
Pdftk
Hope that this will be of some help :)

Where does Preview store PDF annotations on OS X Lion?

I'm working on a tool in Python to extract highlighted passages from PDF files. I regularly highlight PDFs in Preview on OS X Lion but haven't found a good tool to extract these passages. Other apps exist that do allow you to highlight and export such as Skim but I figure there has to be a way to extract the ones I add in Preview.
I figured that the highlights would be stored in the HFS+ extended attributes for the PDF file but after looking at them using xattr it seems that they're stored elsewhere. I also looked at PDFKit but I only saw how to create annotations rather than locate them.
If someone could tell me where to find the highlights/annotations or point me at some documentation that explains this I would really appreciate it.
When using PDFKit you can get annotation from any PDFPage instance.
[myPDFPage annotations] will return an array of annotations for that particular page.
See the docs for more info.
Technically speaking, highlighting parts of a PDF is adding an annotation to the file. These annotations are PDF objects defined in the PDF specification. They are stored inside the PDF file itself, i.e. they do modify the original file! That's why you'll not find a trace of the highlights in the HFS+ extended attributes...
So the answer to the question of your title line is: Preview stores the highlights inside the PDF file as fully compliant PDF objects.
The answer to your real question implied in your text ('I want to extract the highlighted passages') was well answered by sosborn.