command line tool for generating pdfs from various document types? - pdf

I'm looking for ways to generate pdfs on-the-fly preferably using a command line tool as this will be done from a web-based system.
My requirements include must work on Windows and Linux, should be able to convert Microsoft Word, Excel and HTML into PDF.
Also the ability to concatenate or merge various documents into one PDF output file would be good.
Any suggestions? I would prefer to avoid applications that work as "printer drivers".
many thanks

After doing some research, the best solution I found in the end that could handle all the file formats we needed converting, plus which ran on Linux and Windows was a beautifully elegant lightweight Python script called PyODConverter. This uses OpenOffice (which itself runs in server mode) to do the actual conversions, and it really works beautifully. I used a separate tool called PDFTK to do the PDF concatenation, as I found that ImageMagick loses a lot of information (and creates huge file sizes).
If you find PyODConverter too limited, there is also a more powerful heavyweight option written by the same guy called JODConverter.

Calibre runs on Linux, Windows, and Mac OS X and has command line tools on all three. It can translate a great many document types to PDF and other formats.
(Disclaimer: I'm a heavy user, help out on Calibre's IRC channel, and have been poking at development, so I'm just a bit biased.)

I think this has a command line utility, but not sure. Check this
PDF Creator

Have a look at biopdf, and a PDF printer that uses it called Bullzip PDF. Check the documentation for Bullzip PDF for examples on how it can be automated. It has an API interface as well as the GUI.

Related

Working with Excel files in web app frameworks like Seaside

Ive been reading about seaside and like the sound of it but i cant see an easy way for handling data files, primarily importing Excel. Of course csv files would be more straight forward, but are there any ways to import the various Excel formats (xls,xlsx) without writing your own file parsing routines?
I've heard that the need to open Excel files would be a good reason to choose a windows based system like .NET, what do you guys think?
There are various Smalltalk implementation that support Seaside and that have an excellent integration into the Windows platform: Dolphin Smalltalk, VA Smalltalk, and Cincom Smalltalk. I assume that it is possible to call Excel with any of these.
There are various command line tools available that you could call to convert an XLS file to something you can easily parse (like CSV).
However, I think the most elegant solution (also from an end-user perspective) is the one of Magic/Replace.
Just an answer on the second part of your question: no that is not a good reason. You definitely do not want to run Office as a server proces (I never tried with OpenOffice, but that should work somewhat better).
It is not stable and there are license issues you have to be aware of.
I've worked on a Seaside app that read and wrote excel sheets on Linux. Here's what I did …
First, I had OpenOffice run in the background and converted all office files to OpenDocument format,
and then I imported that into Squeak using some code by Takashi Yamamiya. A word of warning: by the time when I used it, excel import and export didn't work at all, it took me an afternoon and a bit of hacking to get that running, but then it went fine (Niko, why didn't you push the changes back online? Well, you see … ehh, I forgot. And now they're somewhere well hidden on my disk and I don't feel like searching.)
And that's it. It wasn't even slow, just make sure that OOo is running constantly in the background.

Anyway to automatically convert DWF to PDF?

Our eTendering solution, www.monaqasat.com, currently works exclusively with PDF documents for various reasons, some of them being security. We are being asked if we can support DWF documents. For this to happen, we would need to find a way to automatically convert DWF documents to PDF, using some kind of Unix application.
Does anybody know any such application, preferably using Rails or Java?
Thanks,
.Karim
http://www.autodwg.com/pdf/
http://www.dwgto.com/
http://www.aidecad.com/
http://en.wikipedia.org/wiki/List_of_PDF_software
http://www.cogniview.com/convert-pdf-to-excel/category/pdf/
Suggestion would be to install a software printer call its APIs and pass dwf and get back pdf and then apply security as needed.
Autodesk has its DWF Toolkit available at
http://www.autodesk.com/dwftoolkit
It contains full source code in C++ to read & write DWF files, so it should be reasonably easy to make it run under Linux and to use a PDF library to write the output.

A technology for reading pdfs online with annotations?

is there an open source solution that displays PDFs for online reading? It has to be searchable much like google books and if possible has the ability to display annotations?
By "online reading" I'll assume you mean without a PDF reader plugin on the client. In that case you'll need to convert to HTML
http://pdftohtml.sourceforge.net/
If you don't mind losing the ability to copy text then converting to PNG may give you a more accurate rendering
http://www.imagemagick.org/
Regardless of the output format you can manage your searching using the original PDF data. One technology for this is mnogosearch
http://www.mnogosearch.org/
Monogosearch uses pdftotext internally, you may find this useful if you want to write your own search routines. pdftotext is part of the Xpdf suite of utilities
http://www.foolabs.com/xpdf/about.html
All of the tools listed above are available on Windows or Linux
You may also be interested in the Vuzit DocuPub Platform: http://vuzit.com/products/docupub_platform
The display technology itself is not open source, but they provide an API to access their service, so perhaps it is worth investigating.
Don't know if you are looking a software to install or some service to pay for...
I've read a lot about www.getbackboard.com (this is not advertising, only reporting something I've read about, that maybe fits your needs.. ;)
Not sure if they do annotations, but both of these will show PDFs quite well:
http://pdfmenot.com
http://docs.google.com
ICEPdf recently released their code as open source. It is Java based.
PyPdf is really nice. It supports reading the text as well as encryption which I know that itextsharp does not.
Of course you'd have to program in python as IronPython's class libraries aren't quite to the point where you can ref them from another language and use them. (But I imagine they will be someday soon)
PyPdf
This is not open source, but check it out anyways. You can download a free trial of their SDK to try it out. Reading PDF's and their annotations is not simple and I wouldn't trust a production app to open source decoders.
Here is an online demo.
http://www.atalasoft.com/ajaxannotations/default.aspx
Another good pdf reader is FoxitReader.

Is there a free way to convert RTF to PDF?

How can I programmatically convert RTF documents to PDF?
OpenOffice.org can be run in server mode (i.e. without any GUI), can read RTF files and can output PDF files.
You have a number of options depending on:
the platform(s) your application will be running on
whether your application will be a server application (e.g. a web service that you set up once and then it runs), or a widely-available desktop application (e.g. something that must be easily downloadable and installable by many people)
whether you are willing to put little or more programming effort into getting the solution to work
whether you are flexible as to the programming language you will use
Here are some options:
PDFCreator + COM
Windows only
suitable for both desktop and server applications
medium programming effort
any language that allows you to speak COM
OpenOffice ( + JODConverter - optional )
Cross-platform (Windows, Linux, etc.)
suitable for server applications, as OpenOffice is a 100MB+ download
low programming effort
Java (if using JODConverter), or any language that can interface with OpenOffice's UNO
IText + Apache POI
Cross-platform (Windows, Linux, etc.)
suitable for both desktop and server applications
high programming effort
Java
EDIT
Here is an older post that has some commonality with your question.
EDIT 2
I see from your comments that you are on Linux and open to either C++ or Java. Definitely use option 2.
JODConverter (Java): the library takes care of spawning OpenOffice in headless mode and talking Uno to it on your behalf. You provide JODConverter with an input and output file name as well as the input and output types (e.g. rtf and pdf), and when it returns to you the output file is ready.
C++: you can fork+exec one (or more, for load balancing) OpenOffice instances in headless mode (soffice will listen for UNO requests on a socket e.g. port 8100.) From your application use Uno/CPP to instruct OpenOffice to perform the conversion the same way JODConverter does (see the JODConverter source code for how to do this.)
/opt/openoffice.org3/program/soffice.bin \
-accept=socket,host=127.0.0.1,port=8100;urp; \
-headless -nocrashreport -nodefault \
-nolockcheck -nologo -norestore
I am successfully using JODConverter from a Java app to convert miscellaneous document types (some documents dynamically generated from templates) to pdf.
Four years late to the party here, but I use Ted in my web application. I generate RTF programmatically, then use the rtf2pdf.sh script included in the package to generate the PDF. I tried OOo and unoconv previously, but Ted proved faster and more reliable in my application.
Use PDFCreator, a free pdf printer. Just print to pdf. You can control this through COM. Example code is in the COM folder of the install directory.
PDFCreator for windows is the easiest for single documents.
It's also possible to automate PDF creation for large sets of documents by converting them to XML and using XSLT and XSL-FO. There are lots of tutorials for this out there.
For a specific language, such as python, libraries exist to output to PDF fairly trivially.
The only advantage of XML over other simpler solutions is extensibility. You could also programmatically output your document in RTF, HTML, TXT, or just about any other text format.
LibreOffice can convert RTF documents to PDF via command line.
Here are the instructions to install it on CentOS.
And this is an example to initiate conversion from PHP code:
<?php shell_exec('libreoffice4.2 --headless --invisible --norestore --convert-to pdf test.rtf'); ?>
PrimoPDF. It acts as a virtual printer, so you just print to it, and out pops a PDF.
Look at PDF Printer

How to convert Word and Excel documents to PDF programmatically?

We are developing a little application that given a directory with PDF files creates a unique PDF file containing all the PDF files in the directory. This is a simple task using iTextSharp. The problem appears if in the directory exist some files like Word documents, or Excel documents.
My question is, is there a way to convert word, excel documents into PDF programmatically? And even better, is this possible without having the office suite installed on the computer running the application?
Office 2007 allows for this. I have found PDFCreator to be good, the VBA is included in sample files, and have heard that CutePDF is also good. PDFCreator and CutePDF are free.
To work without Office, you would need viewers, as far as I know:
http://www.microsoft.com/downloads/details.aspx?FamilyID=c8378bf4-996c-4569-b547-75edbd03aaf0&displaylang=EN
http://www.microsoft.com/downloads/details.aspx?familyid=95E24C87-8732-48D5-8689-AB826E7B8FDF&displaylang=en
I needed to do this myself, but managed to get it done with .Net and without 3rd party tools:
MSDN: Saving Word 2007 Documents to PDF and XPS Formats
Pretty simple, about 50 lines of code. However I think you will need Word 2007 installed on the machine as well as the ability to Save As PDF
To convert Word documents to PDF, take a look at jWordConvert, a java library that can do exactly that. This will not work with the Excel files though, only with the Word files. The language is not Sharp, it's Java but you could switch to use IText (which is java) instead of ITextSharp.
You can also use a component like activePDF's DocConverter to convert a lot formats to PDF.
Use PDF maker that comes with adobe 7- 9
I just used this code Covert Doc to PDF
I'm surprised Aspose wasn't mentioned here, it's easy, simple, and reliable. Downside is that it is not free.
I've used iTextSharp in the past, it's really good, easy to install (one DLL I believe), the merge takes a bit of tindering so it's not as easy to use as Aspose, but hey, it's free so that is the best part.
TallPDF.NET (comes with a hefty price tag) allows you to serve dynamic PDF from any .NET application including ASP.NET pages and web services.
PDFEdit (free and open source) is an editor for manipulating PDF documents. It has a GUI version and a command-line interface. Scripting is used to a great extent in the editor and almost anything can be scripted. It is possible to create your own scripts or plugins.
The most common way to convert files to a pdf is to print them to a pdf printer driver. There are a number of such drivers, one that i know of that will do the job is Black Ice.
Another is to use Adobe Acrobat's SDK. from memory its very expensive.
Its been a while since i have actually done any work with converting pdf's and the landscape may have changed.