Automatically replace the PlantUML code in an .adoc file with the corresponding PNG images - automation

I have a big .adoc file with plantUML Diagrams in it. The main goal is to convert the .adoc from asciidoc to markdown. For that it is necessary to replace the .puml diagrams to images (png). Its possible to extract the .PNGs from a .adoc file, but so far I did not figure out if there maybe is a tool, library etc. that could replace the .puml diagrams with the corresponding PNG version of it.
Does someone a solution for this? I guess the really last approach would be to write a bashscript that does that automatically (sed commands), but I would love not to do that.
Would be grateful for Answers, have a nice day!

If you convert the PlantUML to PNG, you cannot maintain the diagrams anymore. I assume you want to keep the documents up-to-date and use Markdown as the source of truth.
You can keep PlantUML inside Markdown, similarly to what you do in AsciiDoc. The only extra that you have to use a preprocessor:
https://github.com/verhas/jamal
If you use IntelliJ, you can edit the Markdown text with macros, including the PlantUML text, and see the formatted text with the diagram in WYSIWYG on the right pane. You will have the XXX.md.jam and the XXX.md automatically saved simultaneously using the Asciidoctor plugin and the Jamal preprocessor with it.
I created Jamal. It is open-source. If you need any more help, you can reach me.
It may not be the answer you are looking for, but may also be.

Related

Does anyone know of a technology that allows one to edit the tags on pdfs?

I am looking to programmatically edit the tags in a pdf document.In particular I would like to be able to copy tags from one document to another, and edit them as I copy them over.
I have looked at coherent pdf, pythons pdfrw and pythons pdfedit and not been sucessful. I am creating the pdfs in Latex so any Latex based solution would be amazing, but i have not come up with anything that allows me to create tags).
Any advise?

Get text from a pdf in NSString

I am trying to make an iOS app which would extract plain text from a pdf file and display it in a UITextView. Its simply not a pdf reader to view a pdf file but i would later wish to perform certain operations on that text.
I have already googled a lot but still not able to get an exact solution.
i already tried using https://github.com/zachron/pdfiphone
but the files are using ARMV6 architecture which seems obsolete with xcode 4.5
And if anyone can suggest some exact and non-confusing code using Quartz-2d framework of iOS then it would be great.
Here is An Sample code to Extract text from PDF Hope this Might Help You.
https://github.com/zachron/pdfiphone
This is a library to get the text out of a PDF for the iPhone.
Another Demo is there Which uses OCR technology find the link below
https://github.com/nolanbrown/Tesseract-iPhone-Demo
Also Check this page of the Quartz 2D Programming Guide, it covers everything you need to open and parse a PDF file in iOS. Note that it is not a simple task, since there's no method to extract the full text in one line. You have to work with the data as an input stream, using a CGPDFScanner
Two Other Libraries
https://github.com/KurtCode/PDFKitten/
https://github.com/mobfarm/FastPdfKit
This question comes up all the time. It is VERY hard to extract text from PDF in general. The PDF specification is not designed with text extraction in mind. There are many libraries that try to do the job, essentially by reconstructing the text from the geometric placement of the individual glyphs. These libraries have varying degrees of success, but will all fail on certain PDF documents. In fact, some PDF documents have Glyphs but no way to associate the glyph with a character. For these documents it is simply not possible to extract text, short of using some kind of OCR approach.
PDF is designed as a read-only format that is portable in the sense that a PDF document will be rendered identically on any platform. That is what it is best at, and what it should be used for.
If text is to be edited, do not use PDF.
Here (Extracting text from pdf using objective-c), I found an answer to your question and it works. But not so fine as i need it :(
it can extract only ascii
it return me only one paragraph
Good luck.

Where does Preview store PDF annotations on OS X Lion?

I'm working on a tool in Python to extract highlighted passages from PDF files. I regularly highlight PDFs in Preview on OS X Lion but haven't found a good tool to extract these passages. Other apps exist that do allow you to highlight and export such as Skim but I figure there has to be a way to extract the ones I add in Preview.
I figured that the highlights would be stored in the HFS+ extended attributes for the PDF file but after looking at them using xattr it seems that they're stored elsewhere. I also looked at PDFKit but I only saw how to create annotations rather than locate them.
If someone could tell me where to find the highlights/annotations or point me at some documentation that explains this I would really appreciate it.
When using PDFKit you can get annotation from any PDFPage instance.
[myPDFPage annotations] will return an array of annotations for that particular page.
See the docs for more info.
Technically speaking, highlighting parts of a PDF is adding an annotation to the file. These annotations are PDF objects defined in the PDF specification. They are stored inside the PDF file itself, i.e. they do modify the original file! That's why you'll not find a trace of the highlights in the HFS+ extended attributes...
So the answer to the question of your title line is: Preview stores the highlights inside the PDF file as fully compliant PDF objects.
The answer to your real question implied in your text ('I want to extract the highlighted passages') was well answered by sosborn.

A better file format than PDF or EPUB?

My client wants us to build a custom document viewer for their app. (It really, truly needs to be custom, because there are a ton of application-specific features they need.)
We built one for them last year that took PDFs, generated page images, and backed the images using a hidden layer of text that could be selected and copied. We did it in Flex. It was a nightmare. PDF is horrid.
This year, we need to build one in HTML 5 with similar requirements, except that most of the documents now are in Word or HTML, that is, they have reflowable text, instead of the fixed layout and glyphs of PDF. But they still want to do PDF in the same viewer.
I'm thinking that we need to convert all documents to some common file format that can handle both reflowable text and also the fixed-position glyphs of PDF. (Each document would probably support one or the other, but not both). It would be nice if it were an XML-like markup language that would say:
<text>here's some text</text>
-- or --
<glyph letter="a" name="my_a_glyph" position="10,10"/>
<image src="my_image" position="20,20"/>
or something like that.
Is there any existing file format out there that can handle it? EPUB won't do the fixed-position text, and PDF sucks in too many ways to describe.
I think you can look at FB2 (FictionBook 2) format . That is an XML-based format, designed for publishing books. It includes images, though I am not sure if they can be aligned absolutely.
Also, you can simply go with HTML and do HTML-to-PDF rendering when needed (there exists various components and libraries for this). I don't see (or you have not listed) any reasons why this way doesn't work.
GROFF? Maybe build a macro library to customize it, as needed.
Groff/troff/nroff, the "run off" programs of Unix, can output to postscript or HTML. The jump from postscript to PDF is built in to some PDF viewers; there are also several existing programs for it, pstopdf, for example.
GROFF has some fixed layout options and some flow-like options. With GROFF, it's almost easier to base most of the printout on flowing text, within proscribed bounds.

Best way to generate a custom document?

I am working on generating a document for printing. It should use a specific TTF font and everything must be printed with vector graphics (for quality). Some of the text should be replaced automatically (e.g. current time). Also it should include a custom-generated EPS image with a chart.
Ideally I would like to have some kind of document template where the text could be replaced easily, and it would be nice if it could import the image through path. But I am not sure which format could be good for this. Best I can come to think of is LaTeX, but I don't like that it's a lot of manual work to use it with TTF... any other ideas?
By the way, I am using OS X...
Memoir package is very flexible for your special layouts.
Xetex uses your system fonts (Installed together with TexLive).
You could blend most of those elements to an EPS using imagemagick or gimp script-fu
There are several products out there that will build you a PDF programmatically. I've only used the Coldfusion Report Builder myself and that may not be practical/affordable for your application. If your budget allows I'd look into a commercial reporting product. I know Adobe have several that will generate Flash, FlashPaper or PDF output.