Is there any solution for automatic PDF remediation to comply with accessibility requirements? - pdf

In my organization, we are dealing with a huge number of PDF files (100,000+) that must be remediated to be compliant with WCAG 2.0 requirements. In a short time period, there is no way that we can remediate all of those files due to lack of resources and budget. Hence, we are looking for some tools, techniques, or best practices to be able to get the job done. As a developer who understands agile software development, my approach is to start fixing some issues programmatically whenever it's possible. For example, we probably can develop and run a tool to add an appropriate Author to all PDF files. I have no experience in Accessibility remediation, so I'm not sure if my approach is correct or if there is any sophisticated tools available already to partially remediate PDF files in bulk.
Any suggestion or guidance would be much appreciated.

Related

PDF Comparision Tool to test Bills

I work for a Telecomusnications company as a Test Engineer. As part of my Job, I need to do a regression test to compare Bills each production drop. Could some one please suggest toos to compare PDF bills from Past release to cucrrent release? Tool should be able to compare Bill format, Line Spacing, Charges, Messages etc.
This is a very broad question. I would suggest using something like PDFSharp to analyse your PDFs. The rest is largely an implementation exercise.
It will take a bit of code to get it working to a reasonable degree of accuracy.
You would pretty much need to code the thing from scratch. The good thing is that PDFSharp (or similar libraries) will take the pain out of analysing the PDF files.
Another way to solve this problem could be to transform your PDF into images and then automate some visual comparison on them. There are a couple of tools out there to do such tasks:
testAPI
Sikuli
PerceptualDiff

What are good and bad ways to document a software project? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm responsible of finding a good way to document the software project I'm working on.
What things are important to document? Should documentation of code and design mainly be in the code in the form of comments? Should we put text files or Word documents directly in the source control togetether with code? Should we use a wiki?
Factors to think about include how easy it is for the current team to create the documentation, and how easy it is for other developers to find, correct and extend the documentation later. My experience from many projects is that developers tend to not write documentation because the system for writing it is too complex or developer unfriendly, and that after a few years, new developers can hardly find the little documentation that was written.
I'm interested in what approaches you have used in similar projects. What worked well, what did not work well, and why?
Some key facts about the project:
The platform is C# and .NET.
We use Visual Studio and Team Foundation Server for source control and work item (task) management.
We use Scrum and test-driven development and are inspired by domain-driven design.
The software consists of a collection of web services and two GUI clients.
Other clients are going to integrate with the web services in the future. The integration will be done by other developers on other teams (so the web services form a kind of API).
SharePoint is heavily used throughout the development environment. Most projects have a SharePoint site, including ours.
On our project's SharePoint site we currently have a bunch of MS Office documents on things like requirements, design, presentations for stakeholders etc. Keeping everything up to date is hard.
We also have a SharePoint wiki for the development team only, where we document things in an unstructured manner as we go along. Examples include how our build scripts are organized, our testing policy, coding guidelines.
The software is an in-house application in a fairly big financial institution.
The software is developed by a team of six people over a period of ~1 year.
The developers are consultants hired in for this project only, and will not be available to help in the future (unless the client decides to pay for it).
The client has few guidlines for how this kind of project should be documented.
I think the most important things to document are the decisions. This goes for everything from requirements to architectural choices. What are the requirements of module X? How are these requirements represented in the architecture? Why did you choose architectural pattern A over B? What are the benefits? The same goes for source code: it is common knowledge that commenting the why is way better than the how.
How you document these decisions does not matter that much in my opinion, whether you use a Wiki or a Requirements document made in Word. More important is that these documents are always up-to-date and that it is easy for anyone to access them. This can be achieved by using a wiki, or placing the documents under source control, as you say. If only a few have access to them, they are more likely not to get updated, and not to be read when necessary.
We use a Wiki for our current project and it works very well. It is easy to access for anyone (developers, managers, and customers) and a history can track changes, so you know what has been changed and why. Furthermore, we try to document the code in a meaningful way and document the major design decisions. We try not to document too much, e.g. minor things, as it is always hard to keep those things up-to-date and it is not worth the effort, imho.
Worst for me than lack of documentation is excess of documentation.
Keep in mind that yes: it's really important to document your project, but also that the major part of your documentation is always at risk of never been read at all.
So, I think that a good starting point consist in thinking of your documentation more like something that you may use to introduce new developers to your project than an over detailed description of the inner workings of your software.
G'day,
Definitely use a wiki. I'd recommend TWiki as it's an excellent and extensive implementation of a wiki without being too complicated to install and manage.
Here's a couple of initial thoughts.
Categories:
Start off with an initial ontology of what you want to capture but
allow people to add new categories or sub-categories as required,
allow people to retitle (sub-)categories as required and maybe as agreed for this one so you don't get fragmentation for multiple names for basically the same thing.
let any initial (sub-)categories wither and die if they are left empty. Do this at the end of the project as some areas may only have entries towards the end of a project.
Tagging:
Start using a tag cloud. BTW here's an excellent plug-in available for TWiki to start classifying content early on in the project. Retrofitting tags is almost impossible to do. Starting tagging early also allows people to search for information that may be there already rather than having the same info located in multiple places.
HTH I'll come back and add more points as I think of them.
First and most important, have the comments written in such a way that NDoc can parse them. This is the best way to have the code itself documented, as the developers have to change their development practices very little, and you can generate pages that explain the code without having to look at the code.
Second, getting developers to write documentation is not easy, and getting them to do it might be an exercise in futility. This is where products like Fogbugz come into play. They will help manage the development with tickets, help track check ins, and when your done an iteration, generate release notes.
In conclusion, your best bet is to find the most effective solution that fits in with the devs existing process. If it impacts their development process very little, they will be more likely to adopt the system.

Application (Not a Markup Language) for Producing a User Manual [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Can anyone recommend a program to create user manuals with? Not a markup language (like LaTeX or DocBook) but more something interactive like Scribus. As I'm not the only one that will update the manual the software should be something that's easy for a novice to pick up but still has some advanced features (like linking in text from external sources/tables, handling masterpages/themes etc.).
Regards,
Oscar
Technical Publishing Software - Views on FrameMaker and Its Alternatives
I've done spec documents with LaTeX and Framemaker, and designed a Framemaker workflow to support a team of 5 analysts producing a spec document for an insurance underwriting system. The document was expected to get to 2,000 pages or so. Many years ago (around 1992-1993) I also worked briefly as a typesetter.
Framemaker is designed for technical documentation and does it very well indeed. It also has features designed to support very large documents with multiple authors - people use this system to do documents with more than 100,000 pages. It is also more accessible than LaTeX to users familiar with word processing software.
Key features of Framemaker:
Documents consisting of multiple
files: You can pull together a
'Book' with multiple subsections in
different files. The document can
also be kept in source control.
Textual MIF format for
import/export: The importer is
somewhat finicky (I found generating
working LaTeX to be easier) but you can
generate items such as data
dictionaries and import them into
the document. The file has textual
anchors (see below) so you can
create cross-reference links that
will be stable across imports. I
find this to be a key feature for
specs as it allows cross-references
to link directly to generated items.
Powerful tagging, indexing and cross-referencing System: Everything
is based on tags in Framemaker and
it is easy to apply tags quickly.
This means that cross-referencing,
indexing, conditional text and
applying styles en-masse is easy and
just works. You can generate indexes and TOCs based on tags, so
having multiple specialised indexes
(such as a list of data field names
from screens or a data dictionary)
is easy to do. The document I
described above had 4 separate
indexes.
Stable: Framemaker is designed for
professionals so it doesn't second
guess you in the way that word does.
It is also much more stable on large
documents. Anyone who's tried to
write a document of more than 50-100
pages on Word should have a pretty
fair idea of what this implies.
Scriptable: FM has a C API and there
are various scripting plugins
(FrameScript and FMPython
being probably the most widely used)
which can be used to automate jobs
in FM. Framemaker 10 adds support
for a Javascript based scripting tool
called Extendscript, presumably
ported across from the scripting facility
in InDesign.
Single-sourcing: From a single FM
document you can produce PDF,
Windows Help (CHM), HTML and print
documents fairly easily. The
cross-references also resolve to
hyperlinks.
Global style controls: You can
easily set up styles for a document
and apply it across the whole
document. It also facilitates
running headers and footers with a
great deal of flexibility in having
them track sections, versions,
chapters etc.
Alternatives to Framemaker
LaTeX/Lout: You've already indicated
that you don't want a markup
lanaguage, but the TeX and
Lout systems are used for large
structured documents and do this
well.
Ventura Publisher: Probably the
only real alternative to Framemaker
if you want that sort of user interface
without paying bodily parts for the
privilege.
It has strong support for structured
documents and an XML-based document
interchange format. It's now owned
by Corel, who still appear to be actively promoting it.
There are a couple of other technical publishing tools on the market: Quicksilver (which used to be known as Interleaf) and ArborText. These two are powerful tools - Interleaf used to be the market leader in this field at one point - but quite expensive.
Adobe Indesign: Although Adobe
claim you can do large documents
with InDesign, the cross-referencing
and other large document features
tend to be viewed as lacking by
Framemaker afficionados. There is,
however, a text entry system for it
called InCopy that apparently
does have this sort of
functionality and quite
a large body of Third-party
plugins, some of which do
support tagging and other such facilities.
InDesign also has a scripting API and
a JavaScript interpreter for executing
scripts.
I haven't used Indesign,
so I can't really comment on how
well it works in practice.
DocBook: This is really just
a standard format for structured
documents but has a large ecosystem
of tools surrounding it for writing
and rendering documents. If you
don't want to use LaTeX you will
probably not want to use DocBook for
similar reasons. As Vinko Vrsalovic
points out (+1), This link goes to a StackOverflow
post from someone describing using
DocBook in practice.
I've never really used DocBook and I've
made so many edits to this post that it's now in Wiki mode, so
someone familiar with DocBook might
want to elaborate on this.
Word processing software: Word
has serious shortcomings as a
technical publishing tool and is not
recommended. OpenOffice has
somewhat better structured
documentation functionality than
word and may be a better choice if
politics or requirement to use .doc
as a document interchange format
preclude a better alternative.
Wordperfect is also
considerably better for
documentation-in-the-large than word
and still has a presence in several vertical markets
such as legal offices.
Madcap Software's Blaze and Flare: These
are new kids on the block and live
in roughly the same space as
Framemaker. The company was founded by former
eHelp (creators of RoboHelp) employees and is
actively developing, with multiple releases yearly. Their
offerings have greatly expanded in the past two years,
to the detriment of the quality of the individual products.
It seems focus has been on turning out new products and
by consequence there are a lot of "fit and finish" issues in
each. The authors have chosen to reinvent the wheel in many ways,
resulting in confusing and often broken implementations. Save often,
you will encounter unhandled exceptions. Source control integration
is flaky. For example, moving or deleting a group of files will result in
one source control commit for each file deletion. Big PITA when
you have source control email notifications. Hello 500 emails.
Flare can import Word and Framemaker files, but the import
is far from seamless. Expect to retain all of your content
but plan on completely re-styling from scratch.
Flare shares many of Word's tendancies to do too much behind
the scenes and assume what the user would choose. The HTML looks
like what Word outputs when you export HTML - lots of custom tags
and attributes, deeply nested inline styles, etc. The text
editor is maddening, for example, its cursor model is different
than any other software you've ever used.
Framemaker vs. LaTeX
These two are main systems I have used to produce large, presentable system documents and I've had good results with both.
Ease of Learning: TeX can give you absolute control but actually
achieving this on a complex LaTeX
document without breaking other
items isn't trivial, particularly
where a large number of macro
packages are involved. Basic LaTeX
isn't hard to learn, but making
modified versions of .sty files that
still work takes a bit of tinkering
if you're not a really deep TeX
hacker. It can be done but be
prepared to spend quite a lot of
time fiddling.
Framemaker can give you a good degree of control on the look of the document and isn't that hard to learn. Getting a house style and tweaking the layout (which you probably will have to do) will be easier with Framemaker.
Ease of Text Entry: You can use tools such as Lyx to provide a
wordprocessor-like front end to
LaTeX, and these work well if you
want to write large bodies of text.
Framemaker's DTP-like user interface
works in a way familiar to people
who are used to wordproessing
software. From this perspective
there is little practical
difference.
Templating Document Structure: Framemaker allows a document
structure to be defined in terms of
tags or an XML schema (if using
Structured Framemaker). LaTeX has a
set of canned structural elements
that are flexible enough to be
useful. Adding additional
structural elements (e.g. a data
dictionary item) can be done as a
macro, but making them auto-number
is a bit more challenging and you will
need to poke around behind the
scenes. Both can do it, but it's
considerably more technical to do it
in LaTeX in anything but trivial
cases.
Also, LaTeX does not have
the facility to template the
document structure in the way that
Structured Framemaker does.
However, you can achieve this type
of effect with DocBook and then
generate to LaTeX if desired.
Ease of Integration: I found making a generator for non-trivially
complex MIF files to be quite
fiddly. The MIF parser is quite
pernickety in FM and doesn't really
give good diagnostics. LaTeX
produces far better error messages
and is quite a bit less fussy.
Technical Publishing Software vs. Layout Software
Page layout software started with Pagemaker and the other main players in this space were its competitor Quark Xpess and now InDesign, with which Adobe is essentially trying to deprecate and replace it and Framemaker. Scribus, which you mentioned before, lives in the same space as these products.
If you are producing a manual with less than (say) 50-100 pages, one of the packages would probably do an adequate job. They are really designed for advertising and layout-heavy publication tasks such as magazines, so their support for large-document features of the sort found in Framemaker is fairly limited. The key issue with these products is scalability - they do not work well on large documents.
Just for reference I have actually typeset a 200-page book (someone's autobiography) using Pagemaker. While the fine-grained kerning and leading control helps a bit for copyfitting, it is still a highly manual process to lay out a book sized document. In this case the book was just straight text with no significant cross-referencing or structure other than chapters. Doing a complex technical spec document or manual this size with Pagemaker would have been very fiddly and probably next to impossible to get right without any mistakes.
Technical Publishing vs. Word Processing Software
This is more of a description of key shortcomings of MS-Word for large spec documents. However, it will illustrate some of the main features required for documentation-in-the-large:
Indexing and Cross-Referencing: This is a real chore in Word, and
quite unstable. Framemaker's
tagging features and LaTeX's labels
mean that you can assign a tag or
known label (in a predictable format
if necessary). The textual format
for the tag anchors is exposed in
the user interface, and is used for
the linkage. In Word, the anchors
are much more opaque and not
easily controllable in this way.
Combined with the clumsy user
interface and instability of the
product, this makes maintaining
these fiddly, and often unstable -
you often have to manually fix them
up.
Templated Layouts: Style support in word are quite basic and
numbering tends to be somewhat
unstable. FrameMaker is all about
driving from the tags and applying
styles based on the tags. Global
style changes just work in
Framemaker in a way that they do not
in Word.
Large multi-file Documents: I've never been able to make this work
well in Word, but it is a key
feature in Framemaker and LaTeX.
Again, Word's instability means that
you tend to spend a lot of time
tidying up after it. As the
document grows larger, the
proportion of time spent on this
work grows quadratically -
propensity for breakage proportional
to n (size of document) * time to
fix proportional to size n (time
to fix)
Why is Word so Unstable: Word does a lot behind the scenes to
support novice users and intervene
in layouts. It is also not really
frame-based (text flow conceptually
separate from document layout), but
the developers try to implement
various frame-like behaviours in the UI. When
the A.I. second-guesses you on a
complex document it often does the
wrong thing. Framemaker 'treats the
user as an adult' and does none of
this so things stay where you put
them.
Other word processors such as
Open Office and WordPerfect do not
misbehave in quite the same way as
Word, which is one of the reasons
that just about any word
processor other than Word will do a
better job of technical documents.
Pre-Flighting: In documentation-speak, this is the
process of checking that your
assemblage of files for the document
(image files etc.) is correct before
committing to print. The
professional systems will complain
about things that are wrong, giving
you a chance to correct it. Word
will just put on a happy face and
try to fix things behind the scenes.
A good example of this is a word
file with linked graphics. If you
copy the file and graphics to
another directory and update one of
the graphics in situ, word may well
still read the file from the old
path (I've seen it do this) and not
the new one you've just updated.
However, this behaviour is not consistent and
typifies the rampant abuse of
unstable heuristics in that product.
Pre-Press Support: A publishing system extends into the pre-press
phase of the workflow. This means
it covers preparation for print.
Word processing software tends not
to have this functionality or have
it in a very limited form.
Without getting too far into this, a key difference is that publishing software tends to treat you like a consenting adult and not get in the way when you want to scale or automate things. One can use word processing software for large scale documentation but it has many design decisions adapted to casual users writing short documents with little regard for quality. These adaptations come at the expense of fitness-for-task on large scale document preparation work. The main issues I find with Word for spec documents are the poor indexing and cross-referencing and general instability issues where I am always having to go back and fix things. However, political considerations in most environments (I'm a contractor) mean one is often stuck with it.
Some general comments on the state of technical documentation software
Framemaker would be the obvious choice if Adobe didn't keep giving off signals that they are trying to deprecate it and move its user base to InDesign. However, FM is widely used in aerospace, software and engineering circles and Adobe's management would face a lynch mob if they actually EOL'd the product without a credible migration path. From what one reads on the web, Adobe's acquisition of FM was driven by John Warnock, but he was ousted and FM became a victim of office politics. The net result is that it's been moved to maintenance mode and is quite stagnant.
Ventura Publisher has also been relegated to a niche market to some extent, but at least Corel do not have two competing product lines in the way that Adobe do. It is probably a passable substitute for FM and may be more politically acceptable to PHB types as it is marketed as a 'business publishing' system.
Quicksilver and Arbortext both seem to be viable products, but are very expensive. I've not used either, so I can't really make any real judgement on their merits.
The markup language systems are free and very powerful in many ways. Lout might be a bit easier to work with as it doesn't have quite the level of legacy baggage that LaTeX does. DocBook is also quite widely used and does have quite a bit of tool support. These technologies put a significant squeeze on the 'geek' end of Framemaker's market share and do so on their merits - they have probably taken quite a chunk out of Adobe's profit margins over the years. I would not dismiss these technologies out of hand, but they will be harder to learn in practice.
You might try evaluating InDesign and a selected set of plugins (concentrate on those for tagging and cross-ref/index management). Finally, some of the word processing software (Wordperfect and OpenOffice) give you a reasonable toolkit for structured documentation and work considerably better for this than MS-Word.
PostScript
Yes, that is a pun. I haven't touched on Pre-Press functionality of any of these products. Printing and Pre-Press are technical fields in their own right and the scope for expensive mistakes means you should probably leave this up to specialists.
Framemaker, InDesign, Ventura, QuickSilver, Arbortext and (presumably) the MadCap products all come with facilities to do pre-press preparation. By and large, word processing software does not.
Doing pre-press with LaTeX tends to involve post-processing the PS output with software like psutils or rendering to PDF and taking the pre-press workflow from there. Generally, most pre-press houses can work from PDF, so a good PDF writing tool like Distiller is the best interface for work prepared from tools that are not designed for prepress work. Note that the quality of the output from Distiller tends to be better than the Ghostscript based ones like PDFCreator.
Note that the RGB colour space of a monitor does not have a direct map to a CYMK colour space used by a printing press. Actually getting colours - especially colour photos - to come out correctly on a press is somewhat fraught if you do not have the right kit. For print production, see a specialist unless you have reason to believe you know what you're doing. For a casual user I would still recommend this 15 years after I was involved in the industry, as mistakes are very expensive to fix once they're committed to print.
If you really do want to do colour print work in-house, you will probably need to calibrate your monitor. For best results, you should get a high-fidelity monitor like this one from HP. In order to calibrate the monitor you may also need a sensor like one of the ones described in this review if the monitor does not come with one. Most professional graphics cards like these from Nvidia, AMD or Matrox have facilities to support gamma correction; many consumer ones do as well. You will also need to get calibration data for the press you are going to be using to print, although the pre-press house will probably be able to do this.
As stated before, print media is quite technical in its own right, easy to get wrong and expensive to fix once it's gone to print. If you're not 100% certain you've got your calibration right, get a colour proof like a Chromalin. This is done from the actual film separations (and is thus quite expensive), so it gives an accurate rendition of the actual colour of the final printed article. Doing this for a few sample pages will give you accurate feedback about whether your calibration is set up right.
Acknowledgements: Thanks to Aidan Ryan for expanding the section on Madcap products.
I would recommend "Help & Manual" from EC Software. You can create a printed manual, PDF, Windows help file (CHM), and HTML web based help from a single source document.
I've heard good things about FrameMaker. I've not used it myself, but have had it recommended to me for just such an application.
Adobe Framemaker indeed is the classic tool for writing user manuals. I've used it for all kinds of long documents, and it works very well. Too bad that Adobe left it to rot for years, before noticing that users wouldn't switch.
MSWord took till 2003 to get the bullet/numbering bugs out, and I don't know if they finally got master document working.
LaTeX still is a reasonable alternative. The format is easy to process, and you could generate it from a wiki.
If you want collaboration, then a language-based approach (LaTeX would be my preference although XML-based ones are also good -- Docbook being the flagship here) does make sense, especially if you are tracking files with a version control system.
Anything that does complicate things like any software with a binary or proprietary format will not help you here.
Sorry if it is not the answer you want.
I agree with Ollivier that using DocBook (or LaTEX) is the sanest approach to have easy conversion, sane formatting, nice version control.
Happily, you can try to have your cake and eat it too with a DocBook editor.
Try the ones on this list and see if any satisfies your needs (I haven't used any).
We are using "Help & Manual" from EC Software and it works quite well. Our authors are spread through the U.S. so we share our content files via a hosted SVN server to manage version control. On each workstation we use Tortoise SVN to stay in sync. The product is extremely easy to use and productive.
A VERY nice explanation on what O'Reilly (actually the ones selling all these books...) uses:
O'Reilly Toolchain
It may seem complicated, but depending on the amount of pages you are going to write you maybe should put some consideration into it.
Word (or your favorite word processor)
I make all my user manuals (not to be confused with user HELP files) in Word. Then I can determine if they need to be in PDF, RTF, DOC or even converted to HTML. To solve the multi-user updating issue, I store the file in Source Control which handles all those fun things.
See the Fastware Project blog for an in depth discussion of the tradeoffs of using DocBook etc. Scott Meyer has tried out a lot of possibilities and shares what he's thinking.
Adobe InDesign CS5.5 is much better at cross references and long documents than earlier versions. It is very powerful and relatively easy to learn and use. The feature set is very rich and the more you learn about it the more you can do with it. It supports very powerful XML features and can import and export XML as needed. It can also map Styles to Tags and Tags to styles allowing you to create your XML in an automated fashion if you simply use a full set of character and paragraph styles. I have used the program for years and produced multiple projects from books to one-off advertisements. It is a graphic design tool, but has support for many aspects of book and manual production. I recommend it if you are more concerned with graphics, images or illustrations. InDesign support a wide number of import and export formats.
InDesign CS5.5 has added and improved support for both interactive content and export for EPUB (electronic book) and Adobe's Digital Publishing Suite (DPS) electronic magazine formats.
Framemaker is an excellent tool for books, manuals and long technical documents. It is a bit harder to learn than InDesign but has a richer set of tools for building variables and running headers and footers, if you have the time and inclination to learn how to use them. It also has a very robust XML feature-set, but I have not used it personally.
Unfortunately, Framemaker suffers from lack of support for graphic design. The color system is based very kludgey and spot (PMS) colors are hard to define. Simple things like adding a stroke color and fill color are rudimentary at best. For example, you still can't select a stroke color that's different from an objects fill color. The program is intended to output to laser and inkjet printers and not really to printing presses.
One feature that is really cool is the ability to apply master pages based on the Paragraph styles appearing on the page. The paragraph/illustration numbering in Framemaker is superior to any other program that I have ever used. But it is also difficult to learn and use.
Both programs support output to PDF and PostScript file formats and can generate hyperlinks and interactive content.

How is it possible to sell code written in an interpreted language?

It seems to me that if you are writing in an interpreted language that it must be difficult to sell software, because anyone who buys it can edit it/change it/resell it without much difficulty.
How do you get around this? I have a couple of PHP apps that I'm reluctant to sell to people as it seems that it's too simple for them to change/read/edit/sell what I've produced.
Hardly anyone sells code. We sell the ability to create, edit, support and understand the code.
As a potential buyer of your application, I might find these features attractive:
The ability to change the code to suit my needs
The ability to read the code to better understand what it's doing
... and yes ...
The ability to sell my modifications
All three of those are features.
The third one might be a feature you can't afford to give me. Fix that through legal measures, not technical measures. That's what licensing is for. You could also sell more expensive licenses which do allow resale.
There are things you could do to remove the first two features, but bear in mind that in doing so you are reducing the overall value of your product to some people, and therefore its sale price.
For many people the primary reason for using Open Source software is not that it costs nothing -- it's that you get the source code.
People sell the service of creating web sites all the time. Also, even a compiled language can be altered, it`s just more difficult.
Most of the time the user base does not understand how to make the changes or what to do with the scripts so you are really selling your knowledge of how to install and change the scripts.
Don't sell the software, sell "licences".
I'll try to explain better, build the web app but provide hosting for it. this way your client will never get to "hold" the source code.
If you really must deliver the source code, Obfuscating is the way to go ;)
Possible routes to go:
Translate to a bytecode, binary or an obfuscated format
For instance, Splunk is written mostly in Python, and distribute bytecode. The EVE online client uses Stackless Python to compile to an executable binary.
Host the solution yourself
Put up a website, charge for use.
License the software
They get the source, but cannot legally modify or redistribute the source.
Open source the solution
Anyone can change the code, but you are the de-facto authority on it, and you can earn money by selling support, consultancy and customization services.
You could also consider a combination of approaches. E.g., partition your solution into several stand alone packages, and then open source some of them, and sell bytecode versions of other parts. What you then sell is the complete solution, as well as other services, and some people may benefit and enhance other parts of the solution.
Plenty of companies make money off of applications in interpreted languages and happily distribute the source code with them. Don't take this personally, but your program probably isn't going to be popular enough to have a large following of pirates. And anybody who would pirate your software probably isn't going to buy it in the first place. If they can't pirate it, they'll pirate somebody elses.
Whatever you do, please don't obfuscate your code. It's not an effective means of preventing infringement and it won't do anything other than make life miserable for you and your customers.
Protecting your secret bits is getting more and more difficult.
IMHO, your solution really depends on your target market. If you are targeting business, just give them the code with a good license, and possibly some type of defect so you can determine who gave your code away if that ever happens. Businesses will mostly pay for your app just to stay compliant; it's not worth the legal hassles. And if an individual gets your app for free, that's probably a good thing, since they will try to convince their current and future employers to buy it.
If you are targeting individuals, and can do it as a web app (which you obviously are with PHP), do it as a hosted service, and either sell a monthly subscription or allow free access and find another way to monetize it.
If you definitely need to or want to distribute it to individuals for whatever reason, you can give it away for free, and try to monetize customizations, add-ins, & other support features.
This is a problem that's been discussed a lot, and a few hours’ worth of really focused googling should reveal all the current philosophies on this.
I hope this helps.
Obfuscation may be a good way to go
With PHP you have the option of using the Zend Guard for PHP. I believe it compiles the source code in a way similar to what the php interpreter does,
so it should also increase performance. Of course the price of $ 600 may be too much for your liking ;-)
Anyway, I see no reason why you shouldn't distribute your code with an open source license (see the Open Source Initiative for a list of licenses available). You can find one that prohibits your customer from redistributing your app.
EDIT:
As Novelocrat points out in his comment, a license that prohibits distribution is per definitionem not an Open Source license, the term Open Source refers to a lot more than just the availability of the source code. (See also the answers to this related question for further discussion).

Penetration testing tools [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
We have hundreds of websites which were developed in asp, .net and java and we are paying lot of money for an external agency to do a penetration testing for our sites to check for security loopholes.
Are there any (good) software (paid or free) to do this?
or.. are there any technical articles which can help me develop this tool?
There are a couple different directions you can go with automated testing tools for web applications.
First, there are the commercial web scanners, of which HP WebInspect and Rational AppScan are the two most popular. These are "all-in-one", "fire-and-forget" tools that you download and install on an internal Windows desktop and then give a URL to spider your site, scan for well-known vulnerabilities (ie, the things that have hit Bugtraq), and probe for cross-site scripting and SQL injection vulnerabilities.
Second, there are the source-code scanning tools, of which Coverity and Fortify are probably the two best known. These are tools you install on a developer's desktop to process your Java or C# source code and look for well-known patterns of insecure code, like poor input validation.
Finally, there are the penetration test tools. By far the most popular web app penetration testing tool among security professionals is Burp Suite, which you can find at http://www.portswigger.net/proxy. Others include Spike Proxy and OWASP WebScarab. Again, you'll install this on an internal Windows desktop. It will run as an HTTP proxy, and you'll point your browser at it. You'll use your applications as a normal user would, while it records your actions. You can then go back to each individual page or HTTP action and probe it for security problems.
In a complex environment, and especially if you're considering anything DIY, I strongly recommend the penetration testing tools. Here's why:
Commercial web scanners provide a lot of "breadth", along with excellent reporting. However:
They tend to miss things, because every application is different.
They're expensive (WebInspect starts in the 10's of thousands).
You're paying for stuff you don't need (like databases of known bad CGIs from the '90s).
They're hard to customize.
They can produce noisy results.
Source code scanners are more thorough than web scanners. However:
They're even more expensive than the web scanners.
They require source code to operate.
To be effective, they often require you to annotate your source code (for instance, to pick out input pathways).
They have a tendency to produce false positives.
Both commercial scanners and source code scanners have a bad habit of becoming shelfware. Worse, even if they work, their cost is comparable to getting 1 or 2 entire applications audited by a consultancy; if you trust your consultants, you're guaranteed to get better results from them than from the tools.
Penetration testing tools have downsides too:
They're much harder to use than fire-and-forget commercial scanners.
They assume some expertise in web application vulnerabilities --- you have to know what you're looking for.
They produce little or no formal reporting.
On the other hand:
They're much, much cheaper --- the best of the lot, Burp Suite, costs only 99EU, and has a free version.
They're easy to customize and add to a testing workflow.
They're much better at helping you "get to know" your applications from the inside.
Here's something you'd do with a pen-test tool for a basic web application:
Log into the application through the proxy
Create a "hit list" of the major functional areas of the application, and exercise each once.
Use the "spider" tool in your pen-test application to find all the pages and actions and handlers in the application.
For each dynamic page and each HTML form the spider uncovers, use the "fuzzer" tool (Burp calls it an "intruder") to exercise every parameter with invalid inputs. Most fuzzers come with basic test strings that include:
SQL metacharacters
HTML/Javascript escapes and metacharacters
Internationalized variants of these to evade input filters
Well-known default form field names and values
Well-known directory names, file names, and handler verbs
Spend several hours filtering the resulting errors (a typical fuzz run for one form might generate 1000 of them) looking for suspicious responses.
This is a labor-intensive, "bare-metal" approach. But when your company owns the actual applications, the bare-metal approach pays off, because you can use it to build regression test suites that will run like clockwork at each dev cycle for each app. This is a win for a bunch of reasons:
Your security testing will take a predictable amount of time and resources per application, which allows you to budget and triage.
Your team will get maximally accurate and thorough results, since your testing is going to be tuned to your applications.
It's going to cost less than commercial scanners and less than consultants.
Of course, if you go this route, you're basically turning yourself into a security consultant for your company. I don't think that's a bad thing; if you don't want that expertise, WebInspect or Fortify isn't going to help you much anyways.
I know you asked specifically about pentesting tools, but since those have been amply answered (I usually go with a mix of AppScan and trained pentester), I think it's important to point out that pentesting is not the only way to "check for security loopholes", and is often not the most effective.
Source code review tools can provide you with much better visibility into your codebase, and find many flaws that pentesting won't.
These include Fortify and OunceLabs (expensive and for many languages), VisualStudio.NET CodeAnalysis (for .NET and C++, free with VSTS, decent but not great), OWASP's LAPSE for Java (free, decent not great), CheckMarx (not cheap, fanTASTic tool for .NET and Java, but high overhead), and many more.
An important point you must note - (most of) the automated tools do not find all the vulnerabilities, not even close. You can expect the automated tools to find approximately 35-40% of the secbugs that would be found by a professional pentester; the same goes for automated vs. manual source code review.
And of course a proper SDLC (Security Development Lifecycle), including Threat Modeling, Design Review, etc, will help even more...
McAfee Secure is not a solution. The service they provide is a joke.
See below:
http://blogs.zdnet.com/security/?p=1092&tag=rbxccnbzd1
http://blogs.zdnet.com/security/?p=1068&tag=rbxccnbzd1
http://blogs.zdnet.com/security/?p=1114&tag=rbxccnbzd1
I've heard good things about SpiDynamics WebInspect as far as paid solutions go, as well as Nikto (for a free solution) and other open source tools. Nessus is an excellent tool for infrastructure in case you need to check that layer as well. You can pick up a live cd with several tools on it called Nubuntu (Auditor, Helix, or any other security based distribution works too) and then Google up some tutorials for the specific tool. Always, always make sure to scan from the local network though. You run the risk of having yourself blocked by the data center if you scan a box from the WAN without authorization. Lesson learned the hard way. ;)
I know you asked specifically about pentesting tools, but since those have been amply answered (I usually go with a mix of AppScan and trained pentester), I think it's important to point out that pentesting is not the only way to "check for security loopholes", and is often not the most effective.
Source code review tools can provide you with much better visibility into your codebase, and find many flaws that pentesting won't.
These include Fortify and OunceLabs (expensive and for many languages), VisualStudio.NET CodeAnalysis (for .NET and C++, free with VSTS, decent but not great), OWASP's LAPSE for Java (free, decent not great), CheckMarx (not cheap, fanTASTic tool for .NET and Java, but high overhead), and many more.
An important point you must note - (most of) the automated tools do not find all the vulnerabilities, not even close. You can expect the automated tools to find approximately 35-40% of the secbugs that would be found by a professional pentester; the same goes for automated vs. manual source code review.
And of course a proper SDLC (Security Development Lifecycle), including Threat Modeling, Design Review, etc, will help even more...
Skipfish, w3af, arachni, ratproxy, ZAP, WebScarab : all free and very good IMO
http://www.nessus.org/nessus/ -- Nessus will help suggests ways to make your servers better. It can't really test custom apps by itself, though I think the plugins are relatively easy to create on your own.
Take a look at Rational App Scan (used to be called Watchfire). Its not free, but has a nice UI, is dead powerful, generates reports (bespoke and against standard compliance frameworks such as Basel2) and I believe you can script it into your CI build.
How about nikto ?
For this type of testing you really want to be looking at some type of fuzz tester. SPIKE Proxy is one of a couple of fuzz testers for web apps. It is open source and written in Python. I believe there are a couple of videos from BlackHat or DefCON on using SPIKE out there somewhere, but I'm having difficulty locating them.
There are a couple of high end professional software packages that will do the web app testing and much more. One of the more popular tools would be CoreImpact
If you do plan on going through with the Pen Testing on your own I highly recommend you read through much of the OWASP Project's documentation. Specifically the OWASP Application Security Verification and Testing/Development guides. The mindset you need to thoroughly test your application is a little different than your normal development mindset (not that it SHOULD be different, but it usually is).
what about rat proxy?
A semi-automated, largely passive web
application security audit tool,
optimized for an accurate and
sensitive detection, and automatic
annotation, of potential problems and
security-relevant design patterns
based on the observation of existing,
user-initiated traffic in complex web
2.0 environments.
Detects and prioritizes broad classes
of security problems, such as dynamic
cross-site trust model considerations,
script inclusion issues, content
serving problems, insufficient XSRF
and XSS defenses, and much more
Ratproxy is currently believed to support Linux, FreeBSD, MacOS X, and Windows (Cygwin) environments.
formerly hackersafe McAfee Secure.