A technology for reading pdfs online with annotations? - pdf

is there an open source solution that displays PDFs for online reading? It has to be searchable much like google books and if possible has the ability to display annotations?

By "online reading" I'll assume you mean without a PDF reader plugin on the client. In that case you'll need to convert to HTML
http://pdftohtml.sourceforge.net/
If you don't mind losing the ability to copy text then converting to PNG may give you a more accurate rendering
http://www.imagemagick.org/
Regardless of the output format you can manage your searching using the original PDF data. One technology for this is mnogosearch
http://www.mnogosearch.org/
Monogosearch uses pdftotext internally, you may find this useful if you want to write your own search routines. pdftotext is part of the Xpdf suite of utilities
http://www.foolabs.com/xpdf/about.html
All of the tools listed above are available on Windows or Linux

You may also be interested in the Vuzit DocuPub Platform: http://vuzit.com/products/docupub_platform
The display technology itself is not open source, but they provide an API to access their service, so perhaps it is worth investigating.

Don't know if you are looking a software to install or some service to pay for...
I've read a lot about www.getbackboard.com (this is not advertising, only reporting something I've read about, that maybe fits your needs.. ;)

Not sure if they do annotations, but both of these will show PDFs quite well:
http://pdfmenot.com
http://docs.google.com

ICEPdf recently released their code as open source. It is Java based.

PyPdf is really nice. It supports reading the text as well as encryption which I know that itextsharp does not.
Of course you'd have to program in python as IronPython's class libraries aren't quite to the point where you can ref them from another language and use them. (But I imagine they will be someday soon)
PyPdf

This is not open source, but check it out anyways. You can download a free trial of their SDK to try it out. Reading PDF's and their annotations is not simple and I wouldn't trust a production app to open source decoders.
Here is an online demo.
http://www.atalasoft.com/ajaxannotations/default.aspx
Another good pdf reader is FoxitReader.

Related

Document creation libraries and formats?

I am going to start my final project for spring semester at school and looking at feasibility before I put my leg in it. One of my potential projects requires me to make an archiveable document of web sessions. These archives should be searchable (and if possible with pretty design). PDF and Open Document formats are in mind for now. Is there any thing else I can look into besides these ? I want to make sure that I pick the right plan before my school starts so that I can be confident about it. I have to use C#.NET for this.
Any suggestions are welcome.
Regards,
Lalith
If you want to convert logs into PDF, You can use Third Party Libraries. There are plenty of c# .NET VCL are present. Like,
iTextSharp(Not free for Commercial use)
Report.NET(free,No Support)
PdfSharp(free,No Support Yet)
Gnostice PDFOne .NET(Non Free,With full help,support)
But if you want to create PDF using c# yourself, it is a pretty hard work since PDF uses PostScript which may be very new and complex to you. First Study the format you are going to use and make sure you can implement it. I would suggest to stick with PDF since its platform,editor independent.
http://www.gnupdf.org/Category:PDF

Haskell: parsing PDF

What I need is to read pdf, make some transformations (generate TOC bookmarks) and write it back.
I found this http://hackage.haskell.org/package/HPDF , but it only mentions generating pdf, not the parsing (although I could have missed it)
Haskell is chosen purely for (self)educational purposes.
There are a few tools for PDF manipulation, though they seem to bias towards generation, rather than parsing:
http://johnmacfarlane.net/pandoc/
Pandoc is a great cross-markup library, but doesn't support PDF parsing (it does support PDF generation from a variety of formats).
There's also:
http://hackage.haskell.org/package/HsHaruPDF
http://hackage.haskell.org/package/pdf2line -- tool for extracting text from pdf
http://hackage.haskell.org/package/HPDF -- another pdf generation library
I'm not sure we have a good parsing tool yet.
Also as a learning exercise, I started a PDF parsing library in Haskell, but it's incomplete and has been languishing a bit from lack of attention. I'd be happy to share it with you, and would love feedback, improvements, etc. It's not currently hosted on hackage, but if you're interested in working with an incomplete implementation, let me know and I'll ask some colleagues for advice on getting it up there.
Here's a haskell binding to parts of xpdf:
http://hackage.haskell.org/package/pdf2line
Checkout pdf-toolbox library. It's support for PDF file generating is low level, but powerful enough for your task.
Here is an example how to change title of an existing PDF file using incremental update feature.
Another package to consider is rakhana which is also on hackage.

Add watermark to various documents investigation

I've been asked to investigate the feasibility of adding watermarks to documents when printed through our application. The documents will consist of word, pdf and cad.
The interface of the application is vb6 with a plethora of vc6 dll's.
I can see a couple of possible solutions:
Convert all documents to PDF, add a watermark and then print.
Find a print driver that will add a watermark to all documents prior to printing and install it and reenable it at runtime if it gets disabled for any reason.
3rd Party suites are possibility (we use Volo View Express for viewing CAD files) but since this application is nearing end-of-life we wouldn't want to spend too much on it.
Has anyone had any experience of the above? Any gotcha's that will bog me down?
Tracker Software has a good set of PDF api's that that will allow you to implement the solution you already have in mind. I've used their Image and PDF libraries quite a bit with a lot of success in both VB6 and .NET. Single user licenses are not expensive (depending on how you look at it I guess), and I've found support to be excellent as well.

command line tool for generating pdfs from various document types?

I'm looking for ways to generate pdfs on-the-fly preferably using a command line tool as this will be done from a web-based system.
My requirements include must work on Windows and Linux, should be able to convert Microsoft Word, Excel and HTML into PDF.
Also the ability to concatenate or merge various documents into one PDF output file would be good.
Any suggestions? I would prefer to avoid applications that work as "printer drivers".
many thanks
After doing some research, the best solution I found in the end that could handle all the file formats we needed converting, plus which ran on Linux and Windows was a beautifully elegant lightweight Python script called PyODConverter. This uses OpenOffice (which itself runs in server mode) to do the actual conversions, and it really works beautifully. I used a separate tool called PDFTK to do the PDF concatenation, as I found that ImageMagick loses a lot of information (and creates huge file sizes).
If you find PyODConverter too limited, there is also a more powerful heavyweight option written by the same guy called JODConverter.
Calibre runs on Linux, Windows, and Mac OS X and has command line tools on all three. It can translate a great many document types to PDF and other formats.
(Disclaimer: I'm a heavy user, help out on Calibre's IRC channel, and have been poking at development, so I'm just a bit biased.)
I think this has a command line utility, but not sure. Check this
PDF Creator
Have a look at biopdf, and a PDF printer that uses it called Bullzip PDF. Check the documentation for Bullzip PDF for examples on how it can be automated. It has an API interface as well as the GUI.

Anyway to automatically convert DWF to PDF?

Our eTendering solution, www.monaqasat.com, currently works exclusively with PDF documents for various reasons, some of them being security. We are being asked if we can support DWF documents. For this to happen, we would need to find a way to automatically convert DWF documents to PDF, using some kind of Unix application.
Does anybody know any such application, preferably using Rails or Java?
Thanks,
.Karim
http://www.autodwg.com/pdf/
http://www.dwgto.com/
http://www.aidecad.com/
http://en.wikipedia.org/wiki/List_of_PDF_software
http://www.cogniview.com/convert-pdf-to-excel/category/pdf/
Suggestion would be to install a software printer call its APIs and pass dwf and get back pdf and then apply security as needed.
Autodesk has its DWF Toolkit available at
http://www.autodesk.com/dwftoolkit
It contains full source code in C++ to read & write DWF files, so it should be reasonably easy to make it run under Linux and to use a PDF library to write the output.