We have some tools that generate PDF. We want to automate some testing to make sure the generated PDFs are tagged (PDF/UA) and that the tags are valid.
There are a lot of interactive checkers (acrobat, PDF Accessibility Checker (PAC), etc). They generate reports of things that pass/fail in the PDF based on the matterhorn protocol. I'd like to generate these similar reports but automated.
I recently found a perl module, PDF::API2, that might be promising but I only wrote a few simple tests with perl about 15 years ago. Has anyone used that module for tagged pdf checking or have you done this with a different scripting language?
The technology used in Adobe Acrobat (in its Preflight component) is developed by callas software (caution: I'm heavily affiliated with this company). callas also develops the same technology under the name pdfaPilot, which exists in a manual version but also in command-line and SDK versions that fully automate the process.
But!
As stated by Max Wyss in his comment on your question already, there are two parts to PDF/UA checking. Some of the specification's rules can be tested automatically by software, but a lot of them cannot.
To give one example, it is possible to verify programmatically that all text in a PDF document is tagged with a language. It's a whole other ballgame to check whether those language tags are actually correct.
pdfaPilot Desktop actually allows you to automatically check what is possible, and then allows you to convert the PDF/UA file into visually tagged HTML which makes it much easier to verify that meaning and structure of the text are correct.
In other words, yes, such technology exists, but it will never be 100% complete.
I'm doing research to find 3rd party packages/solutions/widgets/back-ends to allow users of a website to upload and edit office documents online in their browser, akin to Google Docs. I haven't had much luck so wanted to see if anyone has any advice or pointers.
Integrating Google Docs is, sadly, not an option, as the documents have to remain stored on our servers only.
ckEditor (and similar) are good rich text content editors, but don't support page formatting and many other features of office documents.
I've found many libraries for converting MS Office, PDF, PS, etc. documents into a common format, but no editor widgets or front-ends that support editing that common format.
Advice on 3rd party solutions or other libraries/widgets I've missed would be really appreciated, whether OSS or proprietary.
You're asking about office productivity suites? I've personally used Zoho and thought it was okay. I'm not sure which of these offer document -> online editing -> save offline document, though.
Wikipedia has lists of these online productivity suites :
Online word processors
Online Office Suites
If you're a Volume Licensed customer of MSFT, you can deploy Office Web Apps (2010) on-premise with SharePoint 2010. TechNet subscribers can also get it - this is what I did - for testing. Everything is x64 (Web Apps, SharePoint, etc.)
I've been asked to investigate the feasibility of adding watermarks to documents when printed through our application. The documents will consist of word, pdf and cad.
The interface of the application is vb6 with a plethora of vc6 dll's.
I can see a couple of possible solutions:
Convert all documents to PDF, add a watermark and then print.
Find a print driver that will add a watermark to all documents prior to printing and install it and reenable it at runtime if it gets disabled for any reason.
3rd Party suites are possibility (we use Volo View Express for viewing CAD files) but since this application is nearing end-of-life we wouldn't want to spend too much on it.
Has anyone had any experience of the above? Any gotcha's that will bog me down?
Tracker Software has a good set of PDF api's that that will allow you to implement the solution you already have in mind. I've used their Image and PDF libraries quite a bit with a lot of success in both VB6 and .NET. Single user licenses are not expensive (depending on how you look at it I guess), and I've found support to be excellent as well.
is there an open source solution that displays PDFs for online reading? It has to be searchable much like google books and if possible has the ability to display annotations?
By "online reading" I'll assume you mean without a PDF reader plugin on the client. In that case you'll need to convert to HTML
http://pdftohtml.sourceforge.net/
If you don't mind losing the ability to copy text then converting to PNG may give you a more accurate rendering
http://www.imagemagick.org/
Regardless of the output format you can manage your searching using the original PDF data. One technology for this is mnogosearch
http://www.mnogosearch.org/
Monogosearch uses pdftotext internally, you may find this useful if you want to write your own search routines. pdftotext is part of the Xpdf suite of utilities
http://www.foolabs.com/xpdf/about.html
All of the tools listed above are available on Windows or Linux
You may also be interested in the Vuzit DocuPub Platform: http://vuzit.com/products/docupub_platform
The display technology itself is not open source, but they provide an API to access their service, so perhaps it is worth investigating.
Don't know if you are looking a software to install or some service to pay for...
I've read a lot about www.getbackboard.com (this is not advertising, only reporting something I've read about, that maybe fits your needs.. ;)
Not sure if they do annotations, but both of these will show PDFs quite well:
http://pdfmenot.com
http://docs.google.com
ICEPdf recently released their code as open source. It is Java based.
PyPdf is really nice. It supports reading the text as well as encryption which I know that itextsharp does not.
Of course you'd have to program in python as IronPython's class libraries aren't quite to the point where you can ref them from another language and use them. (But I imagine they will be someday soon)
PyPdf
This is not open source, but check it out anyways. You can download a free trial of their SDK to try it out. Reading PDF's and their annotations is not simple and I wouldn't trust a production app to open source decoders.
Here is an online demo.
http://www.atalasoft.com/ajaxannotations/default.aspx
Another good pdf reader is FoxitReader.
What are the software/ Wiki you use to write and share your specs about the developers, testers and management?
Do you use Wiki system, and if so, what Wiki software you use?
Or do you use Sharepoint to manage and version the specs? One problem with SharePoint 2003 as specs platform is that it's very hard to collaborate among different people.
For backward compatibility sake, I would also like to have the platform able to import Microsoft Word seamlessly. And it would certainly help if the interface is similar to Microsoft Word.
Any idea?
I've used Confluence at a number of places, it's a pretty powerful wiki and very good for creating specifications that can be shared amongst various parties. See:
http://www.atlassian.com/software/confluence/
There's some more information here on the advantages of using Confluence:
https://stackoverflow.com/questions/170352/confluence-experiences
EDIT: I've updated this to deal with the Microsoft Word import feature you mentioned. Confluence supports this through the Office Connector here:
http://www.atlassian.com/software/confluence/plugins/office-connector.jsp
There's also a Sharepoint connector:
http://www.atlassian.com/software/confluence/plugins/sharepoint-connector.jsp
plus a whole bunch of plugins:
http://www.atlassian.com/software/confluence/plugins/sharepoint-connector.jsp
Some of these are user contributed also. I can't recommend Confluence enough as a commercial wiki.
I've also used JSPWiki, which is open source. it's ok but not as good as confluence, see:
http://www.jspwiki.org/
You could try Google docs - I have successfully used this in the past. It supports import / export to MS Word, and it has great support for multiple user - see http://www.brighthub.com/internet/google/articles/8236.aspx.
It supports versioning, allows you to chat with other people who are currently working on the document, and shows you a list of all the changes others have made to the document (without needing to close / reopen the document).
If you want corporate support, Google also provides that - see Google Apps for business.
We use SharePoint -- it's not ideal, but it does a decent job. If I were you, I would seriously look at getting off SharePoint 2003 and on to MOSS (SharePoint 2007). It's not perfect, but it's substantially better. Here's a little bit on using MOSS as a wiki. I think in general wiki's are a good tool for getting people up to speed on your system. We used to pass around "getting started documents" and now we have all that type of stuff in our developer portal.
Per John's comment, I looked up this feature comparison. I have to go back and look at what features I'm using that are not in WSS -- I might be paying for licenses I don't need! :)
We use email. I know it isn't elaborate, but it is easy to use. Everyone has it installed and there are no licensing issues. All spec changes are sent to an super set email distro indicating the updates and the location on the network share where the spec can be found.
We use Alfresco, in its Community version, from both its Share and Explorer web interfaces.
Quite useful, with a document library, wiki, forum and calendar.
We curently host about 1.8 Go consisting mainly in docs, versionned and sometimes automatically converted to PDF (by creating an automatic content rule).
FTP, WebDav and network share are also used to access to the same repository.
You could take a look at Microsoft Groove - the collaboration software that Microsoft bought a few years back.
It's bundled free with premium versions of Microsoft Office.
You can customize the workspace with discussion boards and can fairly seamlessly store collaboratively-edited Office documents.
We use MediaWiki for dos & specs. Wiki definitely wins anything like Microsoft Word or SharePoint - it allows you to develop a documentation in "first refer, then describe" = "divide and rule" way. Perfect for developers - they used to think the same way. The process of developing a documentation is almost ideal: you start from TOC and drill down until you write the document for every link you put earlier.
MediaWiki is quite customizable - there are lots of extensions there. The most necessary ones are:
Source code highlighter - CSO_Source
Our own templates integrating wiki with class reference.
Others are InterWiki, FileProtocolLinks, YouTube (we use customized version of it to display HD video), ReCaptcha, SpecialDeleteOldRevisions, Maintenance.
Some integration examples are here.
And we use Google issue tracker to track the issues. Its main advantages:
Imput usability: the process of adding\changing the issue is really convenient there. Earlier we tried Track Studio - the same actions require 2-3 times more time there, so it died fast simply because most of us hated to use it.
Customizable grids. See the examples. Really helpful.
Atom\RSS support. So everyone knows what's going on.
There is a Gurtle tool integrating it with TortoiseSVN. Really helpful.
Its main disadvantage is that it can't be closed from the public access. This makes it simply unusable in many cases.
If you want a UI similar to Word, why not use Word with SharePoint 2007? You're on 2003 so the experience is there. Upgrade to SharePoint 2007 and you can have the collaboration, Word features, document sharing, and so on.
This is the kind of thing Microsoft wants people to use Office for, so there's a ton of doco out there about how to configure your SharePoint and Office environment to support collaboration.
There is something that Google do in this direction and it looks really cool: wave.google.com. It would be a great step in collaboration and worth to wait it.
Here we use Google Docs it makes the documents available to everyone write or read only, public or private among people that have or not Google accounts, it also can import Word docs, not to mention that it runs directly into the browser so it has high availability with zero cost and zero setup, also its computer/OS agnostic, we have a nice experience with it.
Also perhaps you should take a look at Basecamp or Backpack at 37Signals, any of then might also fit your bill.
We use DocBook for all of our specifications (and other customer-facing documentation). DocBook is an XML format that lets you easily generate documents in just about any format, including PDF, which is how we distribute things to clients to get them signed off. We can divide a document into files (by section) and commit everything to our source control system (Subversion). Because it is all XML (i.e. text-based), Subversion's automatic merging and conflict resolution works great if two people work on the same file. We have a set of stylesheets that all of our documents use, so all documents share the exact same style/format, with no extra work on our part.
And if you don't like editing XML files directly, there are GUI front-ends that provide a reasonably WYSIWYG-like experience. I believe that most people in my office use XMLMind. Still, we happen to all be technical people so if we had to write XML directly it wouldn't be an issue.
As a sidenote, we also put out release notes. We have some XSLT that lets us write documents like this:
<bugs>
<bug id="1234" component="web">JavaScript error when clicking the Kick Me button</bug>
</bugs>
We then have a script that runs through our Subversion repository doing an svn log from the previous release tag to the current release tag, and some Bugzilla integration to automatically generate release notes on-the-fly.
(also, for most internal-only documentation, we use MediaWiki, which is also a great way to collaborate.)
We use OnTime. It was originally only used for defect tracking, but we've started using it to track features as well. These can be used to document the feature as it evolves during development. Features can be grouped together into sprints or releases, and time can be tracked against each feature. If you are using SCRUM, you can also plot burn-down charts for each sprint. It also has wiki functionality.