I've got a content management solution where we present scanned images (TIFF), PDFs, word docs for viewing. While we can simply embed a PDF, sometimes depending on user preferences it's a bit fiddly and sometimes not user-intuitive.
I'd like a solution like scribd, embedit, etc, but not hosted. I want to run the application on our own servers and manage it that way (for legal reasons, and our clients won't buy the service if it's hosted somewhere else).
SWFtools looks a little basic for my needs, plus doesn't do doc, docx or ppt.
Any options? Doesn't have to be free, but would be ideal.
As far as I understand (Scribd) uses swftools. And it is not basic, it is amazingly flexible. Convert everything into PDF and use swftools to convert PDF's into swf or something like Scribd does (SCB, what they call it, modified swf).
webSupergoo has a .net component that will do this...
Their ABCpdf component can import and export a wide range of graphic and document formats, including those you've mentioned.
The installation also contains an SWF demo project that can be freely adapted, and used as the basis for a scribd-like service.
http://www.websupergoo.com/products.htm
you can try this alternative solution :
FreepapeR.
You can display pdf documents. The pdf is converted using swftools (pdf2swf), using php on the server side or locally by hand, the user interface is written in as3.
Hope this helps...
Related
I'm trying to develop a tool/web application such that it will import a PDF file and I need to select text and images available in PDF by selecting them with a mouse click and marking them as title,content and image with a button click (3 different button) where the marked contents and image will be copied to clipboard or will be pasted into a word document which is going to be a another part. So in which programming language is this possible to work with and carry on ?
I'd probably try researching pure browser-side solution using pdf.js and clipboard API.
Otherwise, you'd still need clipboard API in the browser and the server-side may actually be powered by any programming language which can be hooked into a web server and has a library to parse PDFs.
You said nothing at all about your prospective server platform but to name a few, .NET has PdfSharp which is able to read PDFs, Python has a host of tools available for it. After all, there exist a bunch of command-line utilities to extract data from PDF which can be called using any PL able to call external processes.
Note that this only appears to be a simpler solution than using pdf.js but note that unless your PDFs are really uniform (say, invoices created by some piece of software), and so you'll be able to make your PDF parser know which bits of data it has to extract and return, the parser will need to returl all the data it extracted to the client, and you'll need to somehow render it all there. May be it's exactly what you need but maybe not.
Since PDFs are really tailored for typesetting and not presenting information in a structured manner, I'd try to piggyback on an already hard-core PDF rendering solution which runs in the browser, so see above.
I've been looking for ways by which I can generate Thumbnails from pdf, as shown in the explorer. But the problem is that without Adobe Pro, the free version does not expose all ihe COM interfaces. Is there any other way? please help.
Ghostscript (which is what ImageMagick uses) will generate images in a wide variety of different image formats... if you need something really obscure then use the imagemagick wrapper, otherwise, I prefer the straight dope.
If you can afford a commercial option, you could use Amyuni PDF Creator ActiveX for this task, (or .Net version if that suits your needs better). Using this product you can create jpg/png/bmp images from the first page of your PDF files with the specified resolution, and then use them as thumbnails.
Disclaimer: I am part of the development team of this product.
Here are other SO questions proposing other approaches (not involving COM):
Using ImageMagic in command line
Thumbnail of a PDF page (Java)
is there an open source solution that displays PDFs for online reading? It has to be searchable much like google books and if possible has the ability to display annotations?
By "online reading" I'll assume you mean without a PDF reader plugin on the client. In that case you'll need to convert to HTML
http://pdftohtml.sourceforge.net/
If you don't mind losing the ability to copy text then converting to PNG may give you a more accurate rendering
http://www.imagemagick.org/
Regardless of the output format you can manage your searching using the original PDF data. One technology for this is mnogosearch
http://www.mnogosearch.org/
Monogosearch uses pdftotext internally, you may find this useful if you want to write your own search routines. pdftotext is part of the Xpdf suite of utilities
http://www.foolabs.com/xpdf/about.html
All of the tools listed above are available on Windows or Linux
You may also be interested in the Vuzit DocuPub Platform: http://vuzit.com/products/docupub_platform
The display technology itself is not open source, but they provide an API to access their service, so perhaps it is worth investigating.
Don't know if you are looking a software to install or some service to pay for...
I've read a lot about www.getbackboard.com (this is not advertising, only reporting something I've read about, that maybe fits your needs.. ;)
Not sure if they do annotations, but both of these will show PDFs quite well:
http://pdfmenot.com
http://docs.google.com
ICEPdf recently released their code as open source. It is Java based.
PyPdf is really nice. It supports reading the text as well as encryption which I know that itextsharp does not.
Of course you'd have to program in python as IronPython's class libraries aren't quite to the point where you can ref them from another language and use them. (But I imagine they will be someday soon)
PyPdf
This is not open source, but check it out anyways. You can download a free trial of their SDK to try it out. Reading PDF's and their annotations is not simple and I wouldn't trust a production app to open source decoders.
Here is an online demo.
http://www.atalasoft.com/ajaxannotations/default.aspx
Another good pdf reader is FoxitReader.
Clutching at straws here, I think I remember seeing a solution to this somewhere but can't find it now.
The issue is that I need a Windows application (not .Net) to be able to generate PDFs. The "standard" solution is to use something like PDF995 or CutePDF which create a dummy printer that your application can then print to and it is redirected to a PDF file. The problem is that to control those printers requires updating INI files or registry keys and that is error prone and often runs into concurrency problems.
Building the PDF file programmatically isn't an option, it needs to be able to take the output that would normally be sent to a printer, or possibly convert directly from an Excel file.
Ideally, I'd just pass the Excel file to a COM/ActiveX object and it would write to a file I specify. Next best option would be for it to create a separate printer per print job or have some reasonable way of guaranteeing the filename I give will have the document I print.
This Excel to PDF Batch converter might do the trick as at least it has a command line mode, has anyone tried that? It would only solve the problem for Excel files though.
So, is there a better solution?
(As a side note, for Visual FoxPro reports XFRX works really well, it converts the report directly to a PDF without needing a printer driver.)
You might want to look at BullZip (google it because I cannot add hyperlinks yet). We recently had Jody Meyer present this tool at the Detroit Area Fox User Group (previously shown at the Grand Rapids Area Fox User Group too). It was a great session.
She showed how to use the COM object to automate a ton of the BullZip features including the name of the file and properties like author and keywords. Watermarks are a snap too. It is simple and straightforward and her example was rock solid. Tons of features already done for you so you can simply re-engineer the demo form.
You can download it on the DAFUG Web site, in the downloads folder. File name is BullZipDemo.zip (google Detroit Area Fox User Group) and add the folder and filename.
Rick
VFP MVP
For this scenario I would recommend Amyuni PDF Converter. It provides a Microsoft Certified PDF Printer and ActiveX/.Net controls to communicate with it. Concurrency issues can be avoided by using these controls.
Disclaimer: I am part of the development team of this product.
We are developing a little application that given a directory with PDF files creates a unique PDF file containing all the PDF files in the directory. This is a simple task using iTextSharp. The problem appears if in the directory exist some files like Word documents, or Excel documents.
My question is, is there a way to convert word, excel documents into PDF programmatically? And even better, is this possible without having the office suite installed on the computer running the application?
Office 2007 allows for this. I have found PDFCreator to be good, the VBA is included in sample files, and have heard that CutePDF is also good. PDFCreator and CutePDF are free.
To work without Office, you would need viewers, as far as I know:
http://www.microsoft.com/downloads/details.aspx?FamilyID=c8378bf4-996c-4569-b547-75edbd03aaf0&displaylang=EN
http://www.microsoft.com/downloads/details.aspx?familyid=95E24C87-8732-48D5-8689-AB826E7B8FDF&displaylang=en
I needed to do this myself, but managed to get it done with .Net and without 3rd party tools:
MSDN: Saving Word 2007 Documents to PDF and XPS Formats
Pretty simple, about 50 lines of code. However I think you will need Word 2007 installed on the machine as well as the ability to Save As PDF
To convert Word documents to PDF, take a look at jWordConvert, a java library that can do exactly that. This will not work with the Excel files though, only with the Word files. The language is not Sharp, it's Java but you could switch to use IText (which is java) instead of ITextSharp.
You can also use a component like activePDF's DocConverter to convert a lot formats to PDF.
Use PDF maker that comes with adobe 7- 9
I just used this code Covert Doc to PDF
I'm surprised Aspose wasn't mentioned here, it's easy, simple, and reliable. Downside is that it is not free.
I've used iTextSharp in the past, it's really good, easy to install (one DLL I believe), the merge takes a bit of tindering so it's not as easy to use as Aspose, but hey, it's free so that is the best part.
TallPDF.NET (comes with a hefty price tag) allows you to serve dynamic PDF from any .NET application including ASP.NET pages and web services.
PDFEdit (free and open source) is an editor for manipulating PDF documents. It has a GUI version and a command-line interface. Scripting is used to a great extent in the editor and almost anything can be scripted. It is possible to create your own scripts or plugins.
The most common way to convert files to a pdf is to print them to a pdf printer driver. There are a number of such drivers, one that i know of that will do the job is Black Ice.
Another is to use Adobe Acrobat's SDK. from memory its very expensive.
Its been a while since i have actually done any work with converting pdf's and the landscape may have changed.