I am looking for some opensource suggestions for a CMS which facilitates alot of pdf files. I have thousands of pdf files and I would like to use a CMS which makes handling these files as easy as possible.
Thanks for any suggestions.
Plone has great support for binary files like PDFs: upload, download, manage security/access restrictions, manage caching of them and have the PDF's full-text indexed for search out of the box.
I don't know your requirements, but if I were you, I'd store an editable copy of the CMS content (in HTML) and render it each time it is saved.
To render HTML in the best manner possible, use a tool like WKHTMLTOPDF - easily connected to PHP.
Related
Currently, I have a pdf that is not searchable and I am wondering what the best process is for preparing the file for coldfusion so I can index the file.
In particular, I am wondering whether a pdf file needs to be readable before using extracttext in cfpdf to pull the text from it.
I really appreciate the advice and I hope it helps other people who are interested in indexing pdf files with coldfusion.
I was considering extracting the text with Tesseract as suggested here
Performing Optical Character Recognition on PDF's from ColdFusion using a Java or .NET Library?
but if there is a built in feature in coldfusion, I would much rather use that and I think it would be more helpful to other people to know whether coldfusion can natively handle this task.
I need to build a small PDF library that will display many catalogs, the user will be able to view the document and go thru pages but he will not be able to download or share the documents in any way, somehow to work like Google Books (here an example).
I have in mind something like the Google Drive API or some kind of Scribd API, but I don't know if one of those will work, I would like to know if there are more options for these application or the mentioned before will do the job.
Edit: Forgot to mention, all this done in a web browser.
In principle all you need would be the ability to render pages from a PDF file into an image. Your application (you didn't mention where you want to build this) is then responsible for displaying the images, scrolling, moving from page to page etc...
If this is correct there are multiple possible libraries that can do this:
- ImageMagick can convert PDF to images (http://www.imagemagick.org)
- GhostScript has extensions for PDF and can convert PostScript or PDF into images and other formats (http://www.ghostscript.com)
- I'm sure there are many, many more...
There are also a number of commercial tools, for example those from Adobe (licensed through DataLogics, http://www.datalogics.com) and callas software (http://www.callassoftware.com - I'm affiliated with this company)
I have noticed that when you view PDFs in google docs the PDF viewer renders the PDF file into PNG images.
I was wondering if you could use Google Data API to upload a PDF and get the URLs of the rendered PNG files?
I have never used the google API or really had the extra time to learn it, but if it help me do this it will be well worth the extra time.
No you can't do it.
Google explicitly does not allow it.
Downloading PDFs and arbitrary files
Native PDF files cannot be exported in
a format other than .pdf.
I doubt it.. I think it would be easier/more stable to use imagemagick or some other library to handle this. Converting PDFs to images usign imagemagick is just one CLI command.
I have a lot of different sites written in PHP (Drupal) and more and more often clients ask me to create PDFs of various lists, product descriptions and so on.. I've been using dompdf and other pdf libraries but they are a pain to use and have a very limited functionality.
Are there any services out there that'll let me generate a PDF file from a URL and let the user download the result? That would definitely save my day :)
Best regards,
Thomas
If you are trying to convert html to PDF, then there are a couple of services out there which can do that for you (search), but from the top of my head a2ps does a pretty ok job. The basic idea if that if you can generate PostScript from your source, then creating a PDF is not an issue.
If you are looking for a more feature full library then iText can do it (Java though and not free for commercial use).
I've got a content management solution where we present scanned images (TIFF), PDFs, word docs for viewing. While we can simply embed a PDF, sometimes depending on user preferences it's a bit fiddly and sometimes not user-intuitive.
I'd like a solution like scribd, embedit, etc, but not hosted. I want to run the application on our own servers and manage it that way (for legal reasons, and our clients won't buy the service if it's hosted somewhere else).
SWFtools looks a little basic for my needs, plus doesn't do doc, docx or ppt.
Any options? Doesn't have to be free, but would be ideal.
As far as I understand (Scribd) uses swftools. And it is not basic, it is amazingly flexible. Convert everything into PDF and use swftools to convert PDF's into swf or something like Scribd does (SCB, what they call it, modified swf).
webSupergoo has a .net component that will do this...
Their ABCpdf component can import and export a wide range of graphic and document formats, including those you've mentioned.
The installation also contains an SWF demo project that can be freely adapted, and used as the basis for a scribd-like service.
http://www.websupergoo.com/products.htm
you can try this alternative solution :
FreepapeR.
You can display pdf documents. The pdf is converted using swftools (pdf2swf), using php on the server side or locally by hand, the user interface is written in as3.
Hope this helps...