Convert internal wiki page into PDF - pdf

My team uses internal wiki pages for all kinds of stuff. The pages are created with MediaWiki. I wonder if there is any way to convert the wiki pages into PDF format. I have to use it to convert the user documentation to PDF format, so that it can be shipped with the next release. I have seen the 'Download as PDF' option on wikipedia but our internal wiki does not have it. Is there any plugin available which would allow me to convert it?

The Collection Extension has been decommissioned. The last long term support version supporting it will end of life in June 2019 according to
https://www.mediawiki.org/wiki/Version_lifecycle.
The Wikimedia foundation has stopped any development of a replacement after some failed attempts according to https://www.mediawiki.org/w/index.php?title=Topic:Uxkv0ib36m3i8vol&topic_showPostId=uxsjbpkqfmgq1jyx#flow-post-uxsjbpkqfmgq1jyx.
Closed source alternatives are under development by a least two companies. A working open source solution is available as a Debian package, Windows binary, and online converter http://mediawiki2latex.wmflabs.org/, which has been developed by a German worker in a cowshed using the Haskell programming language.

Wikipedia uses the Collection extension with OfflineContentGenerator (OCG) for this purpose.

Related

Show pdf with qtwebkit

I have a project in which we use custom built software, that is not developed by us.
The application is developed in qt 4.7.0 and is running on Ubuntu 10.04 LTS.
It uses html pages to provide the "online help" to the user. My task is to write the initial help content. The pages are rendered using qtwebkit.
Our customer would also like to display pdf-documents. When I asked the developers, I was told to convert the pdfs to html and add the converted files to the online help.
This would cause quite a bit of additional work and results in html-output that won't look exactly like the pdf-files... and it would prevent the simple addition of new pdf-files by the customer.
So I ask the community here: is there a way to display pdf-files with qtwebkit? Are there any plugins?
Cheers,
10.6um
No idea if there is an existing plugin. But I think you could implement a PDF viewer plugin by yourself with the help of poppler. Poppler is portable to Windows/Linux/Mac.
Reimplement QWebPluginFactory and intercept request with PDF mime type.
Download the PDF content, and use poppler to render PDF data to QPixmap.
Set the QPixmap as the content of QWebPage.

On-demand publishing api?

Lulu.com is a "self-publishing" service.
It appears that they used to have an api, but it is no longer offered.
I was wondering if there are any similar services in which I could use a scripting language to generate book documents (pdf, xml, whatever), and programatically place individual orders for custom books.
There is probably not an API out there for auto-converting a non-ebook format to an ebook format.
I recently attended a discussion on publishing at a software development conference. There was much discussion regarding the process of generating/converting documents to PUB-format and other e-Book formats. Apparently, it's not as simple as clicking "convert." There is a lot of manual typesetting and tweaking that has to be done to get a PDF (or .DOC, or whatever) to look right as an e-book.

A technology for reading pdfs online with annotations?

is there an open source solution that displays PDFs for online reading? It has to be searchable much like google books and if possible has the ability to display annotations?
By "online reading" I'll assume you mean without a PDF reader plugin on the client. In that case you'll need to convert to HTML
http://pdftohtml.sourceforge.net/
If you don't mind losing the ability to copy text then converting to PNG may give you a more accurate rendering
http://www.imagemagick.org/
Regardless of the output format you can manage your searching using the original PDF data. One technology for this is mnogosearch
http://www.mnogosearch.org/
Monogosearch uses pdftotext internally, you may find this useful if you want to write your own search routines. pdftotext is part of the Xpdf suite of utilities
http://www.foolabs.com/xpdf/about.html
All of the tools listed above are available on Windows or Linux
You may also be interested in the Vuzit DocuPub Platform: http://vuzit.com/products/docupub_platform
The display technology itself is not open source, but they provide an API to access their service, so perhaps it is worth investigating.
Don't know if you are looking a software to install or some service to pay for...
I've read a lot about www.getbackboard.com (this is not advertising, only reporting something I've read about, that maybe fits your needs.. ;)
Not sure if they do annotations, but both of these will show PDFs quite well:
http://pdfmenot.com
http://docs.google.com
ICEPdf recently released their code as open source. It is Java based.
PyPdf is really nice. It supports reading the text as well as encryption which I know that itextsharp does not.
Of course you'd have to program in python as IronPython's class libraries aren't quite to the point where you can ref them from another language and use them. (But I imagine they will be someday soon)
PyPdf
This is not open source, but check it out anyways. You can download a free trial of their SDK to try it out. Reading PDF's and their annotations is not simple and I wouldn't trust a production app to open source decoders.
Here is an online demo.
http://www.atalasoft.com/ajaxannotations/default.aspx
Another good pdf reader is FoxitReader.

Software/Platform to Share Specs

What are the software/ Wiki you use to write and share your specs about the developers, testers and management?
Do you use Wiki system, and if so, what Wiki software you use?
Or do you use Sharepoint to manage and version the specs? One problem with SharePoint 2003 as specs platform is that it's very hard to collaborate among different people.
For backward compatibility sake, I would also like to have the platform able to import Microsoft Word seamlessly. And it would certainly help if the interface is similar to Microsoft Word.
Any idea?
I've used Confluence at a number of places, it's a pretty powerful wiki and very good for creating specifications that can be shared amongst various parties. See:
http://www.atlassian.com/software/confluence/
There's some more information here on the advantages of using Confluence:
https://stackoverflow.com/questions/170352/confluence-experiences
EDIT: I've updated this to deal with the Microsoft Word import feature you mentioned. Confluence supports this through the Office Connector here:
http://www.atlassian.com/software/confluence/plugins/office-connector.jsp
There's also a Sharepoint connector:
http://www.atlassian.com/software/confluence/plugins/sharepoint-connector.jsp
plus a whole bunch of plugins:
http://www.atlassian.com/software/confluence/plugins/sharepoint-connector.jsp
Some of these are user contributed also. I can't recommend Confluence enough as a commercial wiki.
I've also used JSPWiki, which is open source. it's ok but not as good as confluence, see:
http://www.jspwiki.org/
You could try Google docs - I have successfully used this in the past. It supports import / export to MS Word, and it has great support for multiple user - see http://www.brighthub.com/internet/google/articles/8236.aspx.
It supports versioning, allows you to chat with other people who are currently working on the document, and shows you a list of all the changes others have made to the document (without needing to close / reopen the document).
If you want corporate support, Google also provides that - see Google Apps for business.
We use SharePoint -- it's not ideal, but it does a decent job. If I were you, I would seriously look at getting off SharePoint 2003 and on to MOSS (SharePoint 2007). It's not perfect, but it's substantially better. Here's a little bit on using MOSS as a wiki. I think in general wiki's are a good tool for getting people up to speed on your system. We used to pass around "getting started documents" and now we have all that type of stuff in our developer portal.
Per John's comment, I looked up this feature comparison. I have to go back and look at what features I'm using that are not in WSS -- I might be paying for licenses I don't need! :)
We use email. I know it isn't elaborate, but it is easy to use. Everyone has it installed and there are no licensing issues. All spec changes are sent to an super set email distro indicating the updates and the location on the network share where the spec can be found.
We use Alfresco, in its Community version, from both its Share and Explorer web interfaces.
Quite useful, with a document library, wiki, forum and calendar.
We curently host about 1.8 Go consisting mainly in docs, versionned and sometimes automatically converted to PDF (by creating an automatic content rule).
FTP, WebDav and network share are also used to access to the same repository.
You could take a look at Microsoft Groove - the collaboration software that Microsoft bought a few years back.
It's bundled free with premium versions of Microsoft Office.
You can customize the workspace with discussion boards and can fairly seamlessly store collaboratively-edited Office documents.
We use MediaWiki for dos & specs. Wiki definitely wins anything like Microsoft Word or SharePoint - it allows you to develop a documentation in "first refer, then describe" = "divide and rule" way. Perfect for developers - they used to think the same way. The process of developing a documentation is almost ideal: you start from TOC and drill down until you write the document for every link you put earlier.
MediaWiki is quite customizable - there are lots of extensions there. The most necessary ones are:
Source code highlighter - CSO_Source
Our own templates integrating wiki with class reference.
Others are InterWiki, FileProtocolLinks, YouTube (we use customized version of it to display HD video), ReCaptcha, SpecialDeleteOldRevisions, Maintenance.
Some integration examples are here.
And we use Google issue tracker to track the issues. Its main advantages:
Imput usability: the process of adding\changing the issue is really convenient there. Earlier we tried Track Studio - the same actions require 2-3 times more time there, so it died fast simply because most of us hated to use it.
Customizable grids. See the examples. Really helpful.
Atom\RSS support. So everyone knows what's going on.
There is a Gurtle tool integrating it with TortoiseSVN. Really helpful.
Its main disadvantage is that it can't be closed from the public access. This makes it simply unusable in many cases.
If you want a UI similar to Word, why not use Word with SharePoint 2007? You're on 2003 so the experience is there. Upgrade to SharePoint 2007 and you can have the collaboration, Word features, document sharing, and so on.
This is the kind of thing Microsoft wants people to use Office for, so there's a ton of doco out there about how to configure your SharePoint and Office environment to support collaboration.
There is something that Google do in this direction and it looks really cool: wave.google.com. It would be a great step in collaboration and worth to wait it.
Here we use Google Docs it makes the documents available to everyone write or read only, public or private among people that have or not Google accounts, it also can import Word docs, not to mention that it runs directly into the browser so it has high availability with zero cost and zero setup, also its computer/OS agnostic, we have a nice experience with it.
Also perhaps you should take a look at Basecamp or Backpack at 37Signals, any of then might also fit your bill.
We use DocBook for all of our specifications (and other customer-facing documentation). DocBook is an XML format that lets you easily generate documents in just about any format, including PDF, which is how we distribute things to clients to get them signed off. We can divide a document into files (by section) and commit everything to our source control system (Subversion). Because it is all XML (i.e. text-based), Subversion's automatic merging and conflict resolution works great if two people work on the same file. We have a set of stylesheets that all of our documents use, so all documents share the exact same style/format, with no extra work on our part.
And if you don't like editing XML files directly, there are GUI front-ends that provide a reasonably WYSIWYG-like experience. I believe that most people in my office use XMLMind. Still, we happen to all be technical people so if we had to write XML directly it wouldn't be an issue.
As a sidenote, we also put out release notes. We have some XSLT that lets us write documents like this:
<bugs>
<bug id="1234" component="web">JavaScript error when clicking the Kick Me button</bug>
</bugs>
We then have a script that runs through our Subversion repository doing an svn log from the previous release tag to the current release tag, and some Bugzilla integration to automatically generate release notes on-the-fly.
(also, for most internal-only documentation, we use MediaWiki, which is also a great way to collaborate.)
We use OnTime. It was originally only used for defect tracking, but we've started using it to track features as well. These can be used to document the feature as it evolves during development. Features can be grouped together into sprints or releases, and time can be tracked against each feature. If you are using SCRUM, you can also plot burn-down charts for each sprint. It also has wiki functionality.

Is there a free way to convert RTF to PDF?

How can I programmatically convert RTF documents to PDF?
OpenOffice.org can be run in server mode (i.e. without any GUI), can read RTF files and can output PDF files.
You have a number of options depending on:
the platform(s) your application will be running on
whether your application will be a server application (e.g. a web service that you set up once and then it runs), or a widely-available desktop application (e.g. something that must be easily downloadable and installable by many people)
whether you are willing to put little or more programming effort into getting the solution to work
whether you are flexible as to the programming language you will use
Here are some options:
PDFCreator + COM
Windows only
suitable for both desktop and server applications
medium programming effort
any language that allows you to speak COM
OpenOffice ( + JODConverter - optional )
Cross-platform (Windows, Linux, etc.)
suitable for server applications, as OpenOffice is a 100MB+ download
low programming effort
Java (if using JODConverter), or any language that can interface with OpenOffice's UNO
IText + Apache POI
Cross-platform (Windows, Linux, etc.)
suitable for both desktop and server applications
high programming effort
Java
EDIT
Here is an older post that has some commonality with your question.
EDIT 2
I see from your comments that you are on Linux and open to either C++ or Java. Definitely use option 2.
JODConverter (Java): the library takes care of spawning OpenOffice in headless mode and talking Uno to it on your behalf. You provide JODConverter with an input and output file name as well as the input and output types (e.g. rtf and pdf), and when it returns to you the output file is ready.
C++: you can fork+exec one (or more, for load balancing) OpenOffice instances in headless mode (soffice will listen for UNO requests on a socket e.g. port 8100.) From your application use Uno/CPP to instruct OpenOffice to perform the conversion the same way JODConverter does (see the JODConverter source code for how to do this.)
/opt/openoffice.org3/program/soffice.bin \
-accept=socket,host=127.0.0.1,port=8100;urp; \
-headless -nocrashreport -nodefault \
-nolockcheck -nologo -norestore
I am successfully using JODConverter from a Java app to convert miscellaneous document types (some documents dynamically generated from templates) to pdf.
Four years late to the party here, but I use Ted in my web application. I generate RTF programmatically, then use the rtf2pdf.sh script included in the package to generate the PDF. I tried OOo and unoconv previously, but Ted proved faster and more reliable in my application.
Use PDFCreator, a free pdf printer. Just print to pdf. You can control this through COM. Example code is in the COM folder of the install directory.
PDFCreator for windows is the easiest for single documents.
It's also possible to automate PDF creation for large sets of documents by converting them to XML and using XSLT and XSL-FO. There are lots of tutorials for this out there.
For a specific language, such as python, libraries exist to output to PDF fairly trivially.
The only advantage of XML over other simpler solutions is extensibility. You could also programmatically output your document in RTF, HTML, TXT, or just about any other text format.
LibreOffice can convert RTF documents to PDF via command line.
Here are the instructions to install it on CentOS.
And this is an example to initiate conversion from PHP code:
<?php shell_exec('libreoffice4.2 --headless --invisible --norestore --convert-to pdf test.rtf'); ?>
PrimoPDF. It acts as a virtual printer, so you just print to it, and out pops a PDF.
Look at PDF Printer