How to make web apps to read and analyze pdf file? - pdf

I have a project from my lecture to create a web apps to read and analyze a pdf file based on keywords. What kind of programming language that I can use?
Example : I need to find or check some keywords or data on the pdf file. If the keyword or data is exist and available, the result is true.

I usually work in javascript so could answer you in that, I had a great help from the below conversation, it might be a good help for you too.
extract text from pdf in Javascript

Related

Can I create a software which reads PDF files and creates folders?

I want to create a software using visual basic which reads some text in a PDF file (name on an invoice), and then creates a folder using that name. Is this possible to do, and how would I get started on this? I have programming experience in the past.
PDFs are difficult to manipulate. To do it efficiently, you'd need some libraries that allow you to open the PDFs and extract the text from it.
I haven't used VB much, but I don't expect that there will be much support for PDFs.
You are probably better off using a language like Python, which has a lot of support for PDFs.
See for instance:
- http://www.binpress.com/tutorial/manipulating-pdfs-with-python/167
- https://pypi.python.org/pypi?:action=search&term=parse+pdf&submit=search
The first link also contains a few tutorials.

pdf and netbeans

I am using netbeans to develop my project. I need to put a pdf on a website and according to certain conditions, only parts of the pdf should be view-able.For example, suppose payment made is rs.500. I will let the user view 2 chapters in the pdf for a period of one week. I have no idea how to do this.Can someone help me?
Since you're using NetBeans, I'll assume that you're using Java as your language.
You generate PDFs using either XSL-FO or something like iText. I prefer XSL-FO and Velocity templates, but your situation might be different.
The rules for what to display under different conditions need to be expressed in Java using controllers that accept a request, bind parameters to objects, execute rules, and stream the response as a PDF depending on the outcome. It's not an easy answer.
There are various Java viewers and several have plugins (there is one at http://www.jpedal.org/support_siNetBeans.php).
Your best bet would be to generate a copy of the PDF with just the pages allowed and then display that.

How does wikipedia generates PDF

I would like to know on how wikipedia (http://en.wikipedia.org/) creates PDF? It seem to be using some application at the back-end. Could anyone please let me know on how this is done?
Thanks
Srikanth
Wikipedia runs Mediawiki.
A Google check tells me that they have two PDF extensions.
This one is the one who's still mantained: PDF_Writer
It doesn't use a PHP HTML→PDF generator, (though there are some)
It actually does something trickier and more clever.
The PDF Writer uses the Python Reportlab libraries to generate PDF based on a
DOM derived from parsing mediawiki-markup using the mwlib parser.
To confirm ZJR's answer, these are the document properties:

Are there any services out there that will let me convert an URL to PDF and let the user download the result?

I have a lot of different sites written in PHP (Drupal) and more and more often clients ask me to create PDFs of various lists, product descriptions and so on.. I've been using dompdf and other pdf libraries but they are a pain to use and have a very limited functionality.
Are there any services out there that'll let me generate a PDF file from a URL and let the user download the result? That would definitely save my day :)
Best regards,
Thomas
If you are trying to convert html to PDF, then there are a couple of services out there which can do that for you (search), but from the top of my head a2ps does a pretty ok job. The basic idea if that if you can generate PostScript from your source, then creating a PDF is not an issue.
If you are looking for a more feature full library then iText can do it (Java though and not free for commercial use).

A technology for reading pdfs online with annotations?

is there an open source solution that displays PDFs for online reading? It has to be searchable much like google books and if possible has the ability to display annotations?
By "online reading" I'll assume you mean without a PDF reader plugin on the client. In that case you'll need to convert to HTML
http://pdftohtml.sourceforge.net/
If you don't mind losing the ability to copy text then converting to PNG may give you a more accurate rendering
http://www.imagemagick.org/
Regardless of the output format you can manage your searching using the original PDF data. One technology for this is mnogosearch
http://www.mnogosearch.org/
Monogosearch uses pdftotext internally, you may find this useful if you want to write your own search routines. pdftotext is part of the Xpdf suite of utilities
http://www.foolabs.com/xpdf/about.html
All of the tools listed above are available on Windows or Linux
You may also be interested in the Vuzit DocuPub Platform: http://vuzit.com/products/docupub_platform
The display technology itself is not open source, but they provide an API to access their service, so perhaps it is worth investigating.
Don't know if you are looking a software to install or some service to pay for...
I've read a lot about www.getbackboard.com (this is not advertising, only reporting something I've read about, that maybe fits your needs.. ;)
Not sure if they do annotations, but both of these will show PDFs quite well:
http://pdfmenot.com
http://docs.google.com
ICEPdf recently released their code as open source. It is Java based.
PyPdf is really nice. It supports reading the text as well as encryption which I know that itextsharp does not.
Of course you'd have to program in python as IronPython's class libraries aren't quite to the point where you can ref them from another language and use them. (But I imagine they will be someday soon)
PyPdf
This is not open source, but check it out anyways. You can download a free trial of their SDK to try it out. Reading PDF's and their annotations is not simple and I wouldn't trust a production app to open source decoders.
Here is an online demo.
http://www.atalasoft.com/ajaxannotations/default.aspx
Another good pdf reader is FoxitReader.