Generate PDF from structured data - pdf

I want to be able to generate a highly graphical (with lots of text content as well) PDF file from data that I might have in a database or xml or any other structured form.
Currently our graphic designer creates these PDF files in Photoshop manually after getting the content as a MS Word Document. But usually, there are more than 20 revisions of the content; small changes here and there, spelling corrections, etc.
The 2 disadvantages are:
1) The graphic designer's time is unnecessarily occupied. The first version is the only one he/she should have to work on.
2) The PDF file becomes the document which now has the final revised content, and the initial content is out of sync with it. So if the initial content needs to be somewhere else (like on a website), we need to recreate it from the PDF file.
Generating the PDF file will help me solve both these problems. Perhaps some way in which the graphic designer creates a "Template" and then puts in tags/holders and maps these tags/holders to the relevant data.
Thanks :-)

There are some tools out there for doing this. XSL-FO is useful. Here is a tutorial for creating a pdf from xml (or xhtml) with cocoon. Also see Apache FOP.
You could format your SQL data as XML and still use the same templates this way.

I use the ReportLab python library for this. It could perhaps solve your problem, but you will need to do some work...

In the past I have written scripts that spit out LaTeX then used texi2pdf to solve this kind of problem.

Take a look at iReport and JasperReports at http://jasperforge.org.
iReport lets you design reports, and then you can either programatically fill it with the JasperReports library (Java), or just use iReport to manually create the report.
I have only used it for tabular data, but I don't think there would be any problem for other types of documents.

You could create a form and populate the entries programmatically using a pdf library like iText (Java).

You could look at doing the workflow in PostScript which is plain text that you can easily compose from fragments. Then you can use any free tool to convert to PDF.

Take a look at Prince XML. This tool allows to generate PDF based on XML or HTML and CSS.

A possible way is to use a template engine, like FreeMarker or StringTemplate: these are often used to generate HTML, but they are flexible enough to output any format, actually.
The problem is to make a PDF template, I suppose. Perhaps you can take a sample output and edit it to replace data with placeholders to be filled by the template engine. Might not be trivial!

Sounds like a job that SQL Server Reporting Services can handle quite easily.
Reporting Services allows you to query the data, define the layout, and export to PDF without any intervention. The PDF output can be distributed via email, stored on a file share, and accessed via a page on the report server.
It can handle XML data sources too.

Another approach to generating a PDF file from data is to use prawn, which is based on ruby. I was very pleasantly surprised by how much functionality is included in prawn. It may take some investment up front but this approach will give you a lot of flexibility.

You can combine CSStoXSLFO with XEP from RenderX for high quality output. With this solution you can merge XML data into an XHTML template, which is decorated with CSS. It can also generate charts with the fantastic JFreeChart library. CSS3 page media features are supported.

Related

ReadTheDocs generates PDFs without my HTML tables

We are converting a sizeable document for hosting on ReadTheDocs. We weren't happy with the simple presentation enabled by Markdown table syntax, so we coded our tables as HTML. Very nice in the HTML viewer (e.g., the end of http://manual.cytoscape.org/en/latest/Command_Line_Arguments.html).
In the PDF version generated by ReadTheDocs, each of our tables is completely missing (see page 9 on https://media.readthedocs.org/pdf/cytoscape-working-copy/latest/cytoscape-working-copy.pdf).
Have we made a mistake by coding tables as HTML? Could we have taken a different route and gotten nice tables in both HTML and PDF?
Any advice would be helpful ...
Thanks!
I have not used ReadTheDocs myself, but from reading their Getting Started guide, I assume you are using Sphinx? While Markdown supports embedding raw HTML, Sphinx does not support converting it to other formats.
You should consider moving to reStructuredText (Sphinx's native markup format), as it is much more advanced than Markdown. It can even be extended with custom directives and roles, should you need this. But be sure to first check whether reStructuredText tables offer the flexibility you require. Pandoc can convert your Markdown files to reStructuredText.
I see you are using a table to document command line options. reStructuredText supports documenting command line options using option lists. In theory, you could change how option lists are represented in the output document, but this might not be easy to accomplish, especially for PDF output using LaTeX (shameless plug: using rinohtype for PDF output should make this much easier in the future).

writing text to a pdf file

I have several pdf files (about 20) and very month or so I need to change spme fields with new data. This is a very time consuming task and would like to know if there is an easy way via some sort of application where users can change the name of the variables that have to be stored into the different pdf files. This would be an enormous time saver. thanks for any help.
there are lots of solutions for this.. if you are willing to write some code things can get really interesting.
a simple solution would be to create a template pdf file with placeholder fields (like #{name}, #{age} etc.,), when a new pdf needs to be created using new values you can simple use itest to edit the pdf & replace the placeholders with actual values.
you can also use jasperreports for this but it would be an overkill for just 20 odd documents.
if you are interested in a sample program i'd be happy to provide you one.
If you have form fields in the PDF file then you may use Aspose.Pdf (.NET or Java version) to fill data into those fields programmatically. You can either fill the fields using individual values or import the data from the XML/FDF/XFDF files etc. You can take a template PDF and save the output PDF files with different values. Please see if this might help in your scenario.
Disclosure: I work as developer evangelist at Aspose.

Understanding the PDF DOM

I am writing an application that has to read and interpret data stored in some PDF files. The reading part is done but I am only able to get a dump of all the words on a page and not the format of the words. What I mean is that if I have to extract a table, I am getting the numbers in the table but not the markup which defines the table.
Further, there is some formatting used which displays a few of these numbers within parentheses (meaning that those numbers are negative) but the parentheses themselves are not part of the text. Hence, I am not able to distinguish between positive and negative numbers present in the PDF table!
How do you get the PDF markup along with the text? Is a PDF similar in structure to an XML with tags used to markup tables etc.? If not, then, is there a resource which describes the salient features of the PDF DOM?
I am using VBA and the Acrobat library (AcroExch etc.)
There is no such thing as "PDF markup" in the sense of HTML etc. A table in PDF cannot be distinguished from line art, other than by using OCR, which can be error-prone if the layout is complex. It is simply drawn using geometrical shapes, like in a vector-based graphics program.
"Is a PDF similar in structure to an XML with tags used to markup tables etc.?"
No, not at all.
And there is no such thing as a 'DOM' either. Google for a file named *PDF32000_2008.pdf*. The current PDF specification for v1.7 (ISO spec) is that file. You should be able to locate it on the Adobe website.
As omz stated, text inside PDF does not really have a structure. You can take a look on the specification here. However, for some very specific files, there is something called PDF Tags, or PDF Marked Content, which is fairly new, and it aims to give PDF documents some kind of structure. If you target this kind of files specifically, you might be able to achieve something. Take a look on chapter 10 (Document Interchange) of the Adobe's specification for further details.
Maybe what you want to achieve can be done with less effort and faster by using TET, the Text Extraction Toolkit made by the fine folks from pdflib.com ( http://www.pdflib.com/products/tet/ ) ??
AFAIR, the TET has some (limited) support for table detection as well....

Generating PDF documents from LISP

I want to generate a technical report from lisp (AllegroCL in my case) and I studied various packages/project to help me do this.
Requirements:
Need to generate a PDF
May create an intermediate format like RTF, Restructured TEXT, HTML, Word DOC or Latex
Need to be flexible to be able to add content throughout my application
Need to handle Multi-Page, Headers, Footers, Tables, inclusion of Images.
Possibilities:
cl-pdf and cl-typesetting: I checked this one out and it works for now, but is there a better alternative?
Some Latex generator, but ???
Question:
Do you know alternatives to easily generate (PDF) reports from lisp. What is the best workflow to go for?
we are using cl-pdf and cl-typesetting for the last 3 years and it has numerous issues... (like its confusion around encodings, or silently not rendering things that don't fit, or...) so, i don't recommend new development based on them.
currently we are in the process of moving all our export mechanisms to open document format. openoffice is all happy with it, and there's a plugin for ms office, too.
there's .fodt, the so called flat open document text format, which is a mere xml file describing a document. generating it is as easy as generating xml files.
you can also make parts of your document read-only with a password (insert a section and mark it read-only and protected by a password. when generating the xml, you can generate random hashes as password...).

How can I create a PDF file in classic ASP?

Is there any way to generate PDF files from classic ASP? I have a bunch of user-entered data that needs to be turned into a PDF that the user can download. How can I do this? OpenOffice allows exporting documents to PDF, so could this somehow be leveraged?
I played around a bit with this (Persits ASPPDF): http://www.asppdf.com/
Maybe running an external application that could be using CrystalReports... and you just pass it as an xml?
That's how i would do it... (lazy mode)
See a full list of PDF components here: http://www.aspin.com/home/components/document/pdf Many of them are free.
It is also possible to use XSLT to output PDF but I am not sure if this is supported by the Microsoft XML Parser. I remember there were something stopping me when I tried to do this 3-4 years ago. Might be worth checking out know depending out the type of data you have as source.
However if these are static files or a one time job consider using a PDF converter on your computer and just upload the files to the server. There are heaps of tools for this, including Adobe Acrobat.