Pdf generators from an xml template? - pdf

Are there any PDF generators out there be it commercial or open source which can be used for research purposes? That i can use such as pdfnow.com or any standalone desktop apps which allow me to generate a PDF from a XML template. I have tried researching but there is allot of ambiguity going around.

Applidok is generating PDF based on an original (raw) PDF, a template definition and dynamic/user data (e.g. from a form): http://go.applidok.com/en/howitworks.gz.html
Template format there is JSON, not XML, but approach is the same.

Related

ReadTheDocs generates PDFs without my HTML tables

We are converting a sizeable document for hosting on ReadTheDocs. We weren't happy with the simple presentation enabled by Markdown table syntax, so we coded our tables as HTML. Very nice in the HTML viewer (e.g., the end of http://manual.cytoscape.org/en/latest/Command_Line_Arguments.html).
In the PDF version generated by ReadTheDocs, each of our tables is completely missing (see page 9 on https://media.readthedocs.org/pdf/cytoscape-working-copy/latest/cytoscape-working-copy.pdf).
Have we made a mistake by coding tables as HTML? Could we have taken a different route and gotten nice tables in both HTML and PDF?
Any advice would be helpful ...
Thanks!
I have not used ReadTheDocs myself, but from reading their Getting Started guide, I assume you are using Sphinx? While Markdown supports embedding raw HTML, Sphinx does not support converting it to other formats.
You should consider moving to reStructuredText (Sphinx's native markup format), as it is much more advanced than Markdown. It can even be extended with custom directives and roles, should you need this. But be sure to first check whether reStructuredText tables offer the flexibility you require. Pandoc can convert your Markdown files to reStructuredText.
I see you are using a table to document command line options. reStructuredText supports documenting command line options using option lists. In theory, you could change how option lists are represented in the output document, but this might not be easy to accomplish, especially for PDF output using LaTeX (shameless plug: using rinohtype for PDF output should make this much easier in the future).

WordML to PDF conversion

We receive wordml documents which are basically XML files generated from msword docs which contains all formatting instructions also. Now we have a requirement to convert these files to PDF. I looked at iText xmlworker to do this conversion. What it did was simply removed all XML tags and gave me all the contents as single paragraph in PDF with no formatting.
How to make sure that generated PDF contains text with correct format from this wordml doc.
iText's product XMLWorker requires you to handle each XML element manually (unless you have HTML as input). The XML schema for MS Word documents is extremely complicated, so you'd be working on that for a few years to get something that looks even remotely ok. In short, XMLWorker doesn't do what you think it does.
If you want MS Word to PDF conversion, you need another kind of solution. XDocReport (MIT license) is one of these, and it has plugins for both iText 2 (LGPL license) and iText 5 (AGPL license). Results are not perfect though.

How to create product catalogue in pdf

In this case I have XML data source and external images files whole together representing products catalogue. The basic structure of XML document is following:
categories
subcategories
products
I'm looking for a tool to convert described data source to pdf document, preferably with basic navigation functionality and hierarchical structure. Probably I can do it writing XSLT stylesheet, or writing code in some script language for generating TEX document. Can anyone provide any good LaTeX style for product catalogue or open source tool for generating pdf catalogues?
I suggest you the famous iText Java pdf library generation.
You can load data from XML and generate any type of PDF.
You have to write Java classes to create these type of documents.
I'd use XSL-FO if I thought a transformation would work or iText if I was writing a Java app and expressing it in code worked better.

Generating PDF documents from LISP

I want to generate a technical report from lisp (AllegroCL in my case) and I studied various packages/project to help me do this.
Requirements:
Need to generate a PDF
May create an intermediate format like RTF, Restructured TEXT, HTML, Word DOC or Latex
Need to be flexible to be able to add content throughout my application
Need to handle Multi-Page, Headers, Footers, Tables, inclusion of Images.
Possibilities:
cl-pdf and cl-typesetting: I checked this one out and it works for now, but is there a better alternative?
Some Latex generator, but ???
Question:
Do you know alternatives to easily generate (PDF) reports from lisp. What is the best workflow to go for?
we are using cl-pdf and cl-typesetting for the last 3 years and it has numerous issues... (like its confusion around encodings, or silently not rendering things that don't fit, or...) so, i don't recommend new development based on them.
currently we are in the process of moving all our export mechanisms to open document format. openoffice is all happy with it, and there's a plugin for ms office, too.
there's .fodt, the so called flat open document text format, which is a mere xml file describing a document. generating it is as easy as generating xml files.
you can also make parts of your document read-only with a password (insert a section and mark it read-only and protected by a password. when generating the xml, you can generate random hashes as password...).

Generate PDF from structured data

I want to be able to generate a highly graphical (with lots of text content as well) PDF file from data that I might have in a database or xml or any other structured form.
Currently our graphic designer creates these PDF files in Photoshop manually after getting the content as a MS Word Document. But usually, there are more than 20 revisions of the content; small changes here and there, spelling corrections, etc.
The 2 disadvantages are:
1) The graphic designer's time is unnecessarily occupied. The first version is the only one he/she should have to work on.
2) The PDF file becomes the document which now has the final revised content, and the initial content is out of sync with it. So if the initial content needs to be somewhere else (like on a website), we need to recreate it from the PDF file.
Generating the PDF file will help me solve both these problems. Perhaps some way in which the graphic designer creates a "Template" and then puts in tags/holders and maps these tags/holders to the relevant data.
Thanks :-)
There are some tools out there for doing this. XSL-FO is useful. Here is a tutorial for creating a pdf from xml (or xhtml) with cocoon. Also see Apache FOP.
You could format your SQL data as XML and still use the same templates this way.
I use the ReportLab python library for this. It could perhaps solve your problem, but you will need to do some work...
In the past I have written scripts that spit out LaTeX then used texi2pdf to solve this kind of problem.
Take a look at iReport and JasperReports at http://jasperforge.org.
iReport lets you design reports, and then you can either programatically fill it with the JasperReports library (Java), or just use iReport to manually create the report.
I have only used it for tabular data, but I don't think there would be any problem for other types of documents.
You could create a form and populate the entries programmatically using a pdf library like iText (Java).
You could look at doing the workflow in PostScript which is plain text that you can easily compose from fragments. Then you can use any free tool to convert to PDF.
Take a look at Prince XML. This tool allows to generate PDF based on XML or HTML and CSS.
A possible way is to use a template engine, like FreeMarker or StringTemplate: these are often used to generate HTML, but they are flexible enough to output any format, actually.
The problem is to make a PDF template, I suppose. Perhaps you can take a sample output and edit it to replace data with placeholders to be filled by the template engine. Might not be trivial!
Sounds like a job that SQL Server Reporting Services can handle quite easily.
Reporting Services allows you to query the data, define the layout, and export to PDF without any intervention. The PDF output can be distributed via email, stored on a file share, and accessed via a page on the report server.
It can handle XML data sources too.
Another approach to generating a PDF file from data is to use prawn, which is based on ruby. I was very pleasantly surprised by how much functionality is included in prawn. It may take some investment up front but this approach will give you a lot of flexibility.
You can combine CSStoXSLFO with XEP from RenderX for high quality output. With this solution you can merge XML data into an XHTML template, which is decorated with CSS. It can also generate charts with the fantastic JFreeChart library. CSS3 page media features are supported.