Anyway to automatically convert DWF to PDF? - pdf

Our eTendering solution, www.monaqasat.com, currently works exclusively with PDF documents for various reasons, some of them being security. We are being asked if we can support DWF documents. For this to happen, we would need to find a way to automatically convert DWF documents to PDF, using some kind of Unix application.
Does anybody know any such application, preferably using Rails or Java?
Thanks,
.Karim

http://www.autodwg.com/pdf/
http://www.dwgto.com/
http://www.aidecad.com/
http://en.wikipedia.org/wiki/List_of_PDF_software
http://www.cogniview.com/convert-pdf-to-excel/category/pdf/
Suggestion would be to install a software printer call its APIs and pass dwf and get back pdf and then apply security as needed.

Autodesk has its DWF Toolkit available at
http://www.autodesk.com/dwftoolkit
It contains full source code in C++ to read & write DWF files, so it should be reasonably easy to make it run under Linux and to use a PDF library to write the output.

Related

Using Extracttext from cfpdf to get text of a pdf in coldfusion

Currently, I have a pdf that is not searchable and I am wondering what the best process is for preparing the file for coldfusion so I can index the file.
In particular, I am wondering whether a pdf file needs to be readable before using extracttext in cfpdf to pull the text from it.
I really appreciate the advice and I hope it helps other people who are interested in indexing pdf files with coldfusion.
I was considering extracting the text with Tesseract as suggested here
Performing Optical Character Recognition on PDF's from ColdFusion using a Java or .NET Library?
but if there is a built in feature in coldfusion, I would much rather use that and I think it would be more helpful to other people to know whether coldfusion can natively handle this task.

A technology for reading pdfs online with annotations?

is there an open source solution that displays PDFs for online reading? It has to be searchable much like google books and if possible has the ability to display annotations?
By "online reading" I'll assume you mean without a PDF reader plugin on the client. In that case you'll need to convert to HTML
http://pdftohtml.sourceforge.net/
If you don't mind losing the ability to copy text then converting to PNG may give you a more accurate rendering
http://www.imagemagick.org/
Regardless of the output format you can manage your searching using the original PDF data. One technology for this is mnogosearch
http://www.mnogosearch.org/
Monogosearch uses pdftotext internally, you may find this useful if you want to write your own search routines. pdftotext is part of the Xpdf suite of utilities
http://www.foolabs.com/xpdf/about.html
All of the tools listed above are available on Windows or Linux
You may also be interested in the Vuzit DocuPub Platform: http://vuzit.com/products/docupub_platform
The display technology itself is not open source, but they provide an API to access their service, so perhaps it is worth investigating.
Don't know if you are looking a software to install or some service to pay for...
I've read a lot about www.getbackboard.com (this is not advertising, only reporting something I've read about, that maybe fits your needs.. ;)
Not sure if they do annotations, but both of these will show PDFs quite well:
http://pdfmenot.com
http://docs.google.com
ICEPdf recently released their code as open source. It is Java based.
PyPdf is really nice. It supports reading the text as well as encryption which I know that itextsharp does not.
Of course you'd have to program in python as IronPython's class libraries aren't quite to the point where you can ref them from another language and use them. (But I imagine they will be someday soon)
PyPdf
This is not open source, but check it out anyways. You can download a free trial of their SDK to try it out. Reading PDF's and their annotations is not simple and I wouldn't trust a production app to open source decoders.
Here is an online demo.
http://www.atalasoft.com/ajaxannotations/default.aspx
Another good pdf reader is FoxitReader.

Tools to manipulate doc file and convert them to pdf

I am looking for some good tools (free or paid, though free tool is always preferred)
for doing following operations on word doc files:
Manipulation of doc/docx/text files (like replacing some placeholders with DB values) as well as
converts doc files to .pdf
Because, I will be using this tool in my WCF service library,
So I am looking for a code library and not for a GUI based product.
Please share your experience regarding same.
Thank you!
Aspose has a decent collection of MS Office and PDF manipulation libraries.
Aspose Homepage
On the off chance that you're only looking for PDFs for viewing or archival purposes, you could also setup a PDF print driver and print your office files into a given location using Automation. You could also edit Office files through Automation although this may be tedious.
VSTO would give you access to the save as PDF from the Office applications.
Please see my answer to a related question on SO where I recommend a number of ways to convert your Word document to a format that is more easy to manipulate programmatically (using XSL-FO).

Is there a free way to convert RTF to PDF?

How can I programmatically convert RTF documents to PDF?
OpenOffice.org can be run in server mode (i.e. without any GUI), can read RTF files and can output PDF files.
You have a number of options depending on:
the platform(s) your application will be running on
whether your application will be a server application (e.g. a web service that you set up once and then it runs), or a widely-available desktop application (e.g. something that must be easily downloadable and installable by many people)
whether you are willing to put little or more programming effort into getting the solution to work
whether you are flexible as to the programming language you will use
Here are some options:
PDFCreator + COM
Windows only
suitable for both desktop and server applications
medium programming effort
any language that allows you to speak COM
OpenOffice ( + JODConverter - optional )
Cross-platform (Windows, Linux, etc.)
suitable for server applications, as OpenOffice is a 100MB+ download
low programming effort
Java (if using JODConverter), or any language that can interface with OpenOffice's UNO
IText + Apache POI
Cross-platform (Windows, Linux, etc.)
suitable for both desktop and server applications
high programming effort
Java
EDIT
Here is an older post that has some commonality with your question.
EDIT 2
I see from your comments that you are on Linux and open to either C++ or Java. Definitely use option 2.
JODConverter (Java): the library takes care of spawning OpenOffice in headless mode and talking Uno to it on your behalf. You provide JODConverter with an input and output file name as well as the input and output types (e.g. rtf and pdf), and when it returns to you the output file is ready.
C++: you can fork+exec one (or more, for load balancing) OpenOffice instances in headless mode (soffice will listen for UNO requests on a socket e.g. port 8100.) From your application use Uno/CPP to instruct OpenOffice to perform the conversion the same way JODConverter does (see the JODConverter source code for how to do this.)
/opt/openoffice.org3/program/soffice.bin \
-accept=socket,host=127.0.0.1,port=8100;urp; \
-headless -nocrashreport -nodefault \
-nolockcheck -nologo -norestore
I am successfully using JODConverter from a Java app to convert miscellaneous document types (some documents dynamically generated from templates) to pdf.
Four years late to the party here, but I use Ted in my web application. I generate RTF programmatically, then use the rtf2pdf.sh script included in the package to generate the PDF. I tried OOo and unoconv previously, but Ted proved faster and more reliable in my application.
Use PDFCreator, a free pdf printer. Just print to pdf. You can control this through COM. Example code is in the COM folder of the install directory.
PDFCreator for windows is the easiest for single documents.
It's also possible to automate PDF creation for large sets of documents by converting them to XML and using XSLT and XSL-FO. There are lots of tutorials for this out there.
For a specific language, such as python, libraries exist to output to PDF fairly trivially.
The only advantage of XML over other simpler solutions is extensibility. You could also programmatically output your document in RTF, HTML, TXT, or just about any other text format.
LibreOffice can convert RTF documents to PDF via command line.
Here are the instructions to install it on CentOS.
And this is an example to initiate conversion from PHP code:
<?php shell_exec('libreoffice4.2 --headless --invisible --norestore --convert-to pdf test.rtf'); ?>
PrimoPDF. It acts as a virtual printer, so you just print to it, and out pops a PDF.
Look at PDF Printer

How to convert Word and Excel documents to PDF programmatically?

We are developing a little application that given a directory with PDF files creates a unique PDF file containing all the PDF files in the directory. This is a simple task using iTextSharp. The problem appears if in the directory exist some files like Word documents, or Excel documents.
My question is, is there a way to convert word, excel documents into PDF programmatically? And even better, is this possible without having the office suite installed on the computer running the application?
Office 2007 allows for this. I have found PDFCreator to be good, the VBA is included in sample files, and have heard that CutePDF is also good. PDFCreator and CutePDF are free.
To work without Office, you would need viewers, as far as I know:
http://www.microsoft.com/downloads/details.aspx?FamilyID=c8378bf4-996c-4569-b547-75edbd03aaf0&displaylang=EN
http://www.microsoft.com/downloads/details.aspx?familyid=95E24C87-8732-48D5-8689-AB826E7B8FDF&displaylang=en
I needed to do this myself, but managed to get it done with .Net and without 3rd party tools:
MSDN: Saving Word 2007 Documents to PDF and XPS Formats
Pretty simple, about 50 lines of code. However I think you will need Word 2007 installed on the machine as well as the ability to Save As PDF
To convert Word documents to PDF, take a look at jWordConvert, a java library that can do exactly that. This will not work with the Excel files though, only with the Word files. The language is not Sharp, it's Java but you could switch to use IText (which is java) instead of ITextSharp.
You can also use a component like activePDF's DocConverter to convert a lot formats to PDF.
Use PDF maker that comes with adobe 7- 9
I just used this code Covert Doc to PDF
I'm surprised Aspose wasn't mentioned here, it's easy, simple, and reliable. Downside is that it is not free.
I've used iTextSharp in the past, it's really good, easy to install (one DLL I believe), the merge takes a bit of tindering so it's not as easy to use as Aspose, but hey, it's free so that is the best part.
TallPDF.NET (comes with a hefty price tag) allows you to serve dynamic PDF from any .NET application including ASP.NET pages and web services.
PDFEdit (free and open source) is an editor for manipulating PDF documents. It has a GUI version and a command-line interface. Scripting is used to a great extent in the editor and almost anything can be scripted. It is possible to create your own scripts or plugins.
The most common way to convert files to a pdf is to print them to a pdf printer driver. There are a number of such drivers, one that i know of that will do the job is Black Ice.
Another is to use Adobe Acrobat's SDK. from memory its very expensive.
Its been a while since i have actually done any work with converting pdf's and the landscape may have changed.