I would like to know on how wikipedia (http://en.wikipedia.org/) creates PDF? It seem to be using some application at the back-end. Could anyone please let me know on how this is done?
Thanks
Srikanth
Wikipedia runs Mediawiki.
A Google check tells me that they have two PDF extensions.
This one is the one who's still mantained: PDF_Writer
It doesn't use a PHP HTML→PDF generator, (though there are some)
It actually does something trickier and more clever.
The PDF Writer uses the Python Reportlab libraries to generate PDF based on a
DOM derived from parsing mediawiki-markup using the mwlib parser.
To confirm ZJR's answer, these are the document properties:
Related
I have a project from my lecture to create a web apps to read and analyze a pdf file based on keywords. What kind of programming language that I can use?
Example : I need to find or check some keywords or data on the pdf file. If the keyword or data is exist and available, the result is true.
I usually work in javascript so could answer you in that, I had a great help from the below conversation, it might be a good help for you too.
extract text from pdf in Javascript
I am brandnew to PDF Generation or rendering but have a project to, create a PDF Template system that allows users to save Template to Database,
and later generate a PDF document using the template and values from my database.
Language to use C#
Questions
a) Is there a PDF tool out there that can help me with this and documentation I can study to learn of this?
b) Are there free tools out there for this?
c) How do I create a PDF Template? XML?
Thanks in Advance!
You should have a look at xsl:fo.
Apache has a tool which might be helpful.
You can use PHP to create and modify PDFs. (Everything below is completely free.)
Here are two extensive tutorials on generating PDFs in PHP:
http://blog.eirikhoem.net/index.php/2008/04/28/populate-pdf-templates-with-php-fpdf-fpdi/
http://www.astahost.com/info.php/create-pdf-php_t4972.html
You can use the FPDF library located here to handle generating PDFs based off of templates.
If You are using Java, you could try Docmosis or JODReports - they work from templates and can produce PDF output dynamically based on data and those templates. Depending on your template requirements, you might also be able to use Jasper Reports or Apache POI. All have free versions.
If you are looking for an instant solution, take a look on http://pdfnow.com . You can upload your XSL/FO-Templates and simply generate PDF-Templates with a simple webservice call.
I would give a shot to jsreport. You can install it on premise for free or use it online. It supports html -> pdf transformation using phantomjs or xml -> pdf transformation using apache fop.
The idea is that first you create report template using javascript templating engines like handlebars in jsreport studio and then you get back pdf by calling jsreport api.
If you are in c# there is jsreport c# sdk for it.
Note: I am the author of jsreport
What I need is to read pdf, make some transformations (generate TOC bookmarks) and write it back.
I found this http://hackage.haskell.org/package/HPDF , but it only mentions generating pdf, not the parsing (although I could have missed it)
Haskell is chosen purely for (self)educational purposes.
There are a few tools for PDF manipulation, though they seem to bias towards generation, rather than parsing:
http://johnmacfarlane.net/pandoc/
Pandoc is a great cross-markup library, but doesn't support PDF parsing (it does support PDF generation from a variety of formats).
There's also:
http://hackage.haskell.org/package/HsHaruPDF
http://hackage.haskell.org/package/pdf2line -- tool for extracting text from pdf
http://hackage.haskell.org/package/HPDF -- another pdf generation library
I'm not sure we have a good parsing tool yet.
Also as a learning exercise, I started a PDF parsing library in Haskell, but it's incomplete and has been languishing a bit from lack of attention. I'd be happy to share it with you, and would love feedback, improvements, etc. It's not currently hosted on hackage, but if you're interested in working with an incomplete implementation, let me know and I'll ask some colleagues for advice on getting it up there.
Here's a haskell binding to parts of xpdf:
http://hackage.haskell.org/package/pdf2line
Checkout pdf-toolbox library. It's support for PDF file generating is low level, but powerful enough for your task.
Here is an example how to change title of an existing PDF file using incremental update feature.
Another package to consider is rakhana which is also on hackage.
I have a lot of different sites written in PHP (Drupal) and more and more often clients ask me to create PDFs of various lists, product descriptions and so on.. I've been using dompdf and other pdf libraries but they are a pain to use and have a very limited functionality.
Are there any services out there that'll let me generate a PDF file from a URL and let the user download the result? That would definitely save my day :)
Best regards,
Thomas
If you are trying to convert html to PDF, then there are a couple of services out there which can do that for you (search), but from the top of my head a2ps does a pretty ok job. The basic idea if that if you can generate PostScript from your source, then creating a PDF is not an issue.
If you are looking for a more feature full library then iText can do it (Java though and not free for commercial use).
is there an open source solution that displays PDFs for online reading? It has to be searchable much like google books and if possible has the ability to display annotations?
By "online reading" I'll assume you mean without a PDF reader plugin on the client. In that case you'll need to convert to HTML
http://pdftohtml.sourceforge.net/
If you don't mind losing the ability to copy text then converting to PNG may give you a more accurate rendering
http://www.imagemagick.org/
Regardless of the output format you can manage your searching using the original PDF data. One technology for this is mnogosearch
http://www.mnogosearch.org/
Monogosearch uses pdftotext internally, you may find this useful if you want to write your own search routines. pdftotext is part of the Xpdf suite of utilities
http://www.foolabs.com/xpdf/about.html
All of the tools listed above are available on Windows or Linux
You may also be interested in the Vuzit DocuPub Platform: http://vuzit.com/products/docupub_platform
The display technology itself is not open source, but they provide an API to access their service, so perhaps it is worth investigating.
Don't know if you are looking a software to install or some service to pay for...
I've read a lot about www.getbackboard.com (this is not advertising, only reporting something I've read about, that maybe fits your needs.. ;)
Not sure if they do annotations, but both of these will show PDFs quite well:
http://pdfmenot.com
http://docs.google.com
ICEPdf recently released their code as open source. It is Java based.
PyPdf is really nice. It supports reading the text as well as encryption which I know that itextsharp does not.
Of course you'd have to program in python as IronPython's class libraries aren't quite to the point where you can ref them from another language and use them. (But I imagine they will be someday soon)
PyPdf
This is not open source, but check it out anyways. You can download a free trial of their SDK to try it out. Reading PDF's and their annotations is not simple and I wouldn't trust a production app to open source decoders.
Here is an online demo.
http://www.atalasoft.com/ajaxannotations/default.aspx
Another good pdf reader is FoxitReader.