Software Metrics in Agile Methodologies [closed] - process

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Agile methodologies are rather prevalent these days, but I cannot seem to find much documentation on what metrics are most useful and why. I have found many more things saying that some traditional metrics like LOC and code coverage of tests are not appropriate, leaving two main questions:
Why are those two (and other) metrics inappropriate?
What metrics are best for Agile and why?
Even with an Agile process, wouldn't you want to know how much code coverage you have with your unit tests? Or is it simply that this metric (and others) just are not as useful as other metrics like cyclomatic complexity and velocity?

Agile is a business oriented thing, Agile is about maximizing the customer value while minimizing waste to provide the most optimal ROI. This is what should get measured. And to do so, I use the system that Mary Poppendieck recommends. This system is based on three holistic measurements that must be taken as a package:
Cycle time
From product concept to first release or
From feature request to feature deployment or
From bug detection to resolution
Business Case Realization (without this, everything else is irrelevant)
P&L or
ROI or
Goal of investment
Customer Satisfaction
e.g. Net Promoter Score
Sure, at the team level you can track things like test coverage, cyclomatic complexity, conformance to coding standards, etc, but high quality is not an end in itself, it's just a mean. Don't misinterpret me, I'm not saying high quality doesn't matters, high quality is mandatory to achieve sustainable pace (and we include "no increase of the technical debt" in our Definition of Done) but still, the goal is to deliver value to the customer in a fast and profitable way.

Irrespective of methodology, there are some basic metrics that can and should be used.
According to S. Kahn, the most important are the following three:
size of product
number of defects found in final phase of testing
and number of defects found in the field.
If those are all you track, there's at least five ways they can be used:
calculate product defect rate (A)
calculate test defect rate (B)
determine a desirable goal for A and monitor the performance
determine a desirable goal for B and monitor the performance
assess correlation between A and B
if correlation is found, form metric of test effectiveness (B/A * 100%)
Although not necessarily fun to read, Metrics and Models of Software Quality Engineering provides an excellent in-depth software engineering and metrics overview.

1.1) LOC are easy to answer
They are really dependent of the language you use! The same feature might have a big difference when written on JAVA or on Ruby, for example
A not well written software might have more lines than a good one!
1.2) Code coverage
IMHO you should use metric, although its not perfect, it should give you a nice understanding on where your code needs more tests.
Just one point you should take care here is that it is also dependent of the language. There could be some situations where you have a class or method that you really don't need to test! For example a class with only getters and setters.
2) From (1) you just mentioned code metrics, but judging from your question about velocity, you are interested on metrics on all the creation process, so I would list some:
Velocity: The classic one and, if used well, it can enhance quite well an agile team performance, since you will know what your team can really do on a fixed time.
Burn up and burn down charts : they can give you a good notion about how the team is performing during the interaction (sprint)
There are some articles on InfoQ about this. Here and here.

As for question 1, I don't see any reason those metrics would be bad in an Agile process.
LOC provides you with a relative size measurement. While it may not always be useful to compare numbers between projects, it can provide you with a rate of growth within the project. If you can get it, the number of lines changed within a sprint may be useful as well to track a rate or refactoring.
Code coverage (of lines of code) gives you a general sense of whether or not your team is meeting a minimum bar of automated testing within a project.
As for question 2, keep the items above and here are a few more:
LOC versus test count. If you can, maintain separate ratios for unit, integration and system tests.
Average number of acceptance criteria versus test scenarios (or tests) for each story. It can help provide a better sense of whether or not your testing against the story's intent.
Number of defects discovered
Amount of work discovered (this is often captured by Agile tracking software) that wasn't part original estimates. It will help you judge if you are doing 'enough' planning.
Tracking consistencies, or lack thereof, of velocity sprint to sprint
While probably not popular and probably potentially dangerous, tracking estimates to work completed for each developer. While teams are supposed to be self organized and driven, not all teams are capable of dealing with human problems.

Just to add
Why LOC and Code Coverage of Tests are less than ideal:
Agile emphasizes outcome, not output (see Agile Manifesto). These two simply track output. Also, they do not properly measure refactoring, which is a vital aspect of Agile processes.
Another metric to consider would be Running Tested Features. I can't describe any better than this: http://xprogramming.com/articles/jatrtsmetric/

I'm going to answer to this very old question...
LOC and Test coverage are, in my opinion, good metrics, but they have one big problem: if you push them, you can make them grow fastly, but the result will be terryifing: tons of nonsense code, or in the test coverage, you can invoque all your code in a try-catch block and not write one single assert... Or even worse, just write one for "compliance" reasons, but without any business-facing or code-facing meaning...
So, these kind of metrics are very good if they help the team to honestly evaluate their outcome, but are an evil tool if they form part of some "compliance" rules, as using them in that way causes more harm (dead code, bad tests!) than what you originally wanted to achieve.
So, with every metric, think how you would trick it if you were forced to achieve a certain value, and think of the consequences... This is not an issue of LOC or test coverage, many other metrics can have similar outcome, even cyclomatic complexity... If you divide your code in a bad manner, you can reduce cyclomatic complexity, but it doesn't mean you get better or more readable code!
So, these kind of metrics are quite good to see what's happening inside a team, but any measure you take should be based on concrete goals, not on the metric itself... For example:
Test coverage is low: you implement coding dojos once a month to help train people to write testable code, you find out what code has the worst test coverage and try to implement a better / more testable architecture that helps / motivates developers to write test, etc.
As you can see, you never tell the team to achieve a certain value of test coverage, you just use the metric to see where you can improve and then look for measures that benefit your process, after a time you would expect test coverage to increase, but you are not pushing people to do so! You are evaluating changes in order to see if the measures are helping. If after a time you find out that test coverage has not changed with your measures, then it's time to look for other ideas, and so on...

Related

Taguchi methods - number of experiments - reg

I want to conduct an experiment with ten factors ( factors like costs and capacities) to know the influence of each factor on the optimum value of an optimization problem. I want to know the number of levels required for each factor and the number of experiments required with factor levels for each experiment.
Cost of experiment is not a matter because these are experiments are going to be run using a software, but the time required to run is important because if large number of experiments are required the time will be more.
please throw light.
You have Minitab as a tag on this question; that said, Minitab has an excellent capability to help in planning DOEs. Go to the Assistant menu, then DOE, then Plan and Create...
If you click "Create Modeling Design" in the optimization experiment path, it will give you the setup screen where you can specify response, optimization objective, factors, etc. Notice that the design is a typical factorial-type design where low/high values are used in each experimental run. This should give good results, but just to let you know there are other design types that can be even better given the circumstances of each situation. For instance, you mentioned these are software experiments -- there is a nice design called a "Space Filling Design" which creates factor design points (not necessarily at low/high values) to optimally fill the design search space. These designs are often used for computer simulation experiments.
An excellent text on DOE is https://www.amazon.com/Design-Analysis-Experiments-Douglas-Montgomery/dp/1118146921

Dynamic number of test cases in genetic programming?

When looking at Genetic programming papers, it seems to me that the number of test cases is always fixed. However, most mutations should (?) at every stage of the execution be very deleterious, i. e. make it obvious after one test case that the mutated program performs much worse than the previous one. What happens if you, at first, only try very few (one?) test case and look whether the mutation makes any sense?
Is it maybe so that different test cases test for different features of the solutions, and one mutation will probably improve only one of those features?
I don't know if I agree with your assumption that most mutations should be very deleterious, but you shouldn't care even if they were. Your goal is not to optimize the individuals, but to optimize the population. So trying to determine if a "mutation makes any sense" is exactly what genetic programming is supposed to do: i.e. eliminate mutations that "don't make sense." Your only "guidance" for the algorithm should come through the fitness function.
I'm also not sure what you mean with "test case", but for me it sounds like you are looking for something related to multi-objective-optimization (MOO). That means you try to optimize a solution regarding different aspects of the problem - therefore you do not need to mutate/evaluate a population for a specific test-case, but to find a multi objective fitness function.
"The main idea in MOO is the notion of Pareto dominance" (http://www.gp-field-guide.org.uk)
I think this is a good idea in theory but tricky to put into practice. I can't remember seeing this approach actually used before but I wouldn't be surprised if it has.
I presume your motivation for doing this is to improve the efficiency of the applying the fitness function - you can stop evaluation early and discard the individual (or set fitness to 0) if the tests look like they're going to be terrible.
One challenge is to decide how many test cases to apply; discarding an individual after one random test case is surely not a good idea as the test case could be a real outlier. Perhaps terminating evaluation after 50% of test cases if the fitness of the individual was <10% of the best would probably not discard any very good individuals; on the other hand it might not be worth it given a lot of individuals will be of middle-of-the road fitness and might well only save a small proportion of the computation. You could adjust the numbers so you save more effort, but the more effort you try to save the more chances you have of genuinely good individuals being discarded by accident.
Factor in the extra time to taken to code this and possible bugs etc. and I shouldn't think the benefit would be worthwhile (unless this is a research project in which case it might be interesting to try it and see).
I think it's a good idea. Fitness evaluation is the most computational intense process in GP, so estimating the fitness value of individuals in order to reduce the computational expense of actually calculating the fitness could be an important optimization.
Your idea is a form of fitness approximation, sometimes it's called lazy evaluation (try searching these words, there are some research papers).
There are also distinct but somewhat overlapping schemes, for instance:
Dynamic Subset Selection (Chris Gathercole, Peter Ross) is a method to select a small subset of the training data set on which to actually carry out the GP algorithm;
Segment-Based Genetic Programming (Nailah Al-Madi, Simone Ludwig) is a technique that reduces the execution time of GP by partitioning the dataset into segments and using the segments in the fitness evaluation process.
PS also in the Brood Recombination Crossover (Tackett) child programs are usually evaluated on a restricted number of test cases to speed up the crossover.

estimating of testing effort as a percentage of development time [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Does anyone use a rule of thumb basis to estimate the effort required for testing as a percentage of the effort required for development? And if so what percentage do you use?
From my experience, 25% effort is spent on Analysis; 50% for Design, Development and Unit Test; remaining 25% for testing. Most projects will fit within a +/-10% variance of this rule of thumb depending on the nature of the project, knowledge of resources, quality of inputs & outputs, etc. One can add a project management overhead within these percentages or as an overhead on top within a 10-15% range.
The Google Testing Blog discussed this problem recently:
So a naive answer is that writing test carries a 10% tax. But, we pay taxes in order to get something in return.
(snip)
These benefits translate to real value today as well as tomorrow. I write tests, because the additional benefits I get more than offset the additional cost of 10%. Even if I don't include the long term benefits, the value I get from test today are well worth it. I am faster in developing code with test. How much, well that depends on the complexity of the code. The more complex the thing you are trying to build is (more ifs/loops/dependencies) the greater the benefit of tests are.
When you're estimating testing you need to identify the scope of your testing - are we talking unit test, functional, UAT, interface, security, performance stress and volume?
If you're on a waterfall project you probably have some overhead tasks that are fairly constant. Allow time to prepare any planning documents, schedules and reports.
For a functional test phase (I'm a "system tester" so that's my main point of reference) don't forget to include planning! A test case often needs at least as much effort to extract from requirements / specs / user stories as it will take to execute. In addition you need to include some time for defect raising / retesting. For a larger team you'll need to factor in test management - scheduling, reporting, meetings.
Generally my estimates are based on the complexity of the features being delivered rather than a percentage of dev effort. However this does require access to at least a high-level set of instructions. Years of doing testing enables me to work out that a test of a particular complexity will take x hours of effort for preparation and execution. Some tests may require extra effort for data setup. Some tests may involve negotiating with external systems and have a duration far in excess of the effort required.
In the end, though, you need to review it in the context of the overall project. If your estimate is well above that for BA or Development then there may be something wrong with your underlying assumptions.
I know this is an old topic but it's something I'm revisiting at the moment and is of perennial interest to project managers.
Some years ago, in a safety critical field, I have heard something like one day for unit testing ten lines of code.
I have also observed 50% of effort for development and 50% for testing (not only unit testing).
Are you talking about automated unit/integration tests or manual tests?
For the former, my rule of thumb (based on measurements) is 40-50% added to development time i.e. if developing a use case takes 10 days (before an QA and serious bugfixing happens), writing good tests takes another 4 to 5 days - though this should best happen before and during development, not afterwards.
When you speak of tests, you could mean waterfall or agile test development. In an agile environment, developers should spend 50% of their time developing and maintaining tests.
But that 50% extra will save you time when the re-factoring and manual verification time comes.
Testing time is probably more closely correlated to feature scope than development time. I'd also argue (perhaps controversially) that testing time is correlated to the skill of your development team.
For a 6-to-9 month development effort, I demand a absolute minimum of 2 weeks testing time, performed by actual testers (not the development team) who are well-versed in the software they will be testing (i.e., 2 weeks does not include ramp-up time). This is for a project that has ~5 developers.
Gartner in Oct 2006 states that testing typically consumes between 10% and 35% of work on a system integration project. I assume that it applies to the waterfall method. This is quite a wide range - but there are many dependencies on the amount of customisations to a standard product and the number of systems to be integrated.
The only time I factor in extra time for testing is if I'm unfamiliar with the testing technology I'll be using (e.g. using Selenium tests for the first time). Then I factor in maybe 10-20% for getting up to speed on the tools and getting the test infrastructure in place.
Otherwise testing is just an innate part of development and doesn't warrant an extra estimate. In fact, I'd probably increase the estimate for code done without tests.
EDIT: Note that I'm usually writing code test-first. If I have to come in after the fact and write tests for existing code that's going to slow things down. I don't find that test-first development slows me down at all except for very exploratory (read: throw-away) coding.
Judge by yesterday's weather. How long did it take last time? Are you trending longer or shorter? Each shop is different.
Most agile shops need a lot less time, have drastically fewer defects, and quicker time to resolve them because of TDD. Even so, most agile shops have some measurable time spent with testing/QC.
If this is the first test run for this application, then the answer is "lets see" followed by an attempt. It depends on how quick you can get questions answered,
- how testable it is,
- how many features/functions
- how many defects are discovered,
- how quickly issues are resolved,
- how many times the code cycles
through testing, and
- how many times testing is blocked by
bugs.
There is no way to tell. You could call it 50% or 175% or more, and not be wrong. Why not make a rough guess and multiply by Pi? It won't be much worse than any other answer you can make up.
You should (must) know how long it takes now and whether it's getting faster or slower, and whether the coverage is increasing or decreasing. With those three bits of information, you should be able to guess quite well.

What are the most useful software development metrics? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to track metrics that can be used to improve my team’s software development process, improve time estimates, and detect special case variations that need to be addressed during the project execution.
Please limit each answer to a single metric, describe how to use it, and vote up the good answers.
(source: osnews.com)
ROI.
The total amount of revenue brought in by the software minus the total amount of costs to produce the software. Breakdown the costs by percentage of total cost and isolate your poorest performing and most expensive area in terms of return-on-investment. Improve, automate, or eliminate that problem area if possible. Conversely, find your highest return-on-investment area and find ways to amplify its effects even further. If 80% of your ROI comes from 20% of your cost or effort, expand that particular area and minimize the rest by comparison.
Costs will include payroll, licenses, legal fees, hardware, office equipment, marketing, production, distribution, and support. This can be done on a macro level for a company as whole or a micro level for a team or individual. It can also be applied to time, tasks, and methods in addition to revenue.
This doesn't mean ignore all the details, but find a way to quantify everything and then concentrate on the areas that yield the best (objective) results.
Inverse code coverage
Get a percentage of code not executed during a test. This is similiar to what Shafa mentioned, but the usage is different. If a line of code is ran during testing then we know it might be tested. But if a line of code has not been ran then we know for sure that is has not been tested. Targeting these areas for unit testing will improve quality and takes less time than auditing the code that has been covered. Ideally you can do both, but that never seams to happen.
"improve my team’s software development process": Defect Find and Fix Rates
This relates to the number of defects or bugs raised against the number of fixes which have been committed or verified.
I'd have to say this is one of the really important metrics because it gives you two things:
1. Code churn. How much code is being changed on a daily/weekly basis (which is important when you are trying to stabilize for a release), and,
2. Shows you whether defects are ahead of fixes or vice-versa. This shows you how well the development team is responding to defects raised by the QA/testers.
A low fix rate indicates the team is busy working on other things (features perhaps). If the bug count is high, you might need to get developers to address some of the defects.
A low find rate indicates either your solution is brilliant and almost bug free, or the QA team have been blocked or have another focus.
Track how long is takes to do a task that has an estimate against it. If they were well under, question why. If they are well over, question why.
Don't make it a negative thing, it's fine if tasks blow out or were way under estimated. Your goal is to continually improve your estimation process.
Track the source and type of bugs that you find.
The bug source represents the phase of development in which the bug was introduced. (eg. specification, design, implementation etc.)
The bug type is the broad style of bug. eg. memory allocation, incorrect conditional.
This should allow you to alter the procedures you follow in that phase of development and to tune your coding style guide to try to eliminate over represented bug types.
Velocity: the number of features per given unit time.
Up to you to determine how you define features, but they should be roughly the same order of magnitude otherwise velocity is less useful. For instance, you may classify your features by stories or use cases. These should be broken down so that they are all roughly the same size. Every iteration, figure out how many stories (use-cases) got implemented (completed). The average number of features/iteration is your velocity. Once you know your velocity based on your feature unit you can use it to help estimate how long it will take to complete new projects based on their features.
[EDIT] Alternatively, you can assign a weight like function points or story points to each story as a measure of complexity, then add up the points for each completed feature and compute velocity in points/iteration.
Track the number of clones (similar code snippets) in the source code.
Get rid of clones by refactoring the code as soon as you spot the clones.
Average function length, or possibly a histogram of function lengths to get a better feel.
The longer a function is, the less obvious its correctness. If the code contains lots of long functions, it's probably a safe bet that there are a few bugs hiding in there.
number of failing tests or broken builds per commit.
interdependency between classes. how tightly your code is coupled.
Track whether a piece of source has undergone review and, if so, what type. And later, track the number of bugs found in reviewed vs. unreviewed code.
This will allow you to determine how effectively your code review process(es) are operating in terms of bugs found.
If you're using Scrum, the backlog. How big is it after each sprint? Is it shrinking at a consistent rate? Or is stuff being pushed into the backlog because of (a) stuff that wasn't thought of to begin with ("We need another use case for an audit report that no one thought of, I'll just add it to the backlog.") or (b) not getting stuff done and pushing it into the backlog to meet the date instead of the promised features.
http://cccc.sourceforge.net/
Fan in and Fan out are my favorites.
Fan in:
How many other modules/classes use/know this module
Fan out:
How many other modules does this module use/know
improve time estimates
While Joel Spolsky's Evidence-based Scheduling isn't per se a metric, it sounds like exactly what you want. See http://www.joelonsoftware.com/items/2007/10/26.html
I especially like and use the system that Mary Poppendieck recommends. This system is based on three holistic measurements that must be taken as a package (so no, I'm not going to provide 3 answers):
Cycle time
From product concept to first release or
From feature request to feature deployment or
From bug detection to resolution
Business Case Realization (without this, everything else is irrelevant)
P&L or
ROI or
Goal of investment
Customer Satisfaction
e.g. Net Promoter Score
I don't need more to know if we are in phase with the ultimate goal: providing value to users, and fast.
number of similar lines. (copy/pasted code)
improve my team’s software development process
It is important to understand that metrics can do nothing to improve your team’s software development process. All they can be used for is measuring how well you are advancing toward improving your development process in regards to the particular metric you are using. Perhaps I am quibbling over semantics but the way you are expressing it is why most developers hate it. It sounds like you are trying to use metrics to drive a result instead of using metrics to measure the result.
To put it another way, would you rather have 100% code coverage and lousy unit tests or fantastic unit tests and < 80% coverage?
Your answer should be the latter. You could even want the perfect world and have both but you better focus on the unit tests first and let the coverage get there when it does.
Most of the aforementioned metrics are interesting but won't help you improve team performance. Problem is your asking a management question in a development forum.
Here are a few metrics: Estimates/vs/actuals at the project schedule level and personal level (see previous link to Joel's Evidence-based method), % defects removed at release (see my blog: http://redrockresearch.org/?p=58), Scope creep/month, and overall productivity rating (Putnam's productivity index). Also, developers bandwidth is good to measure.
Every time a bug is reported by the QA team- analyze why that defect escaped unit-testing by the developers.
Consider this as a perpetual-self-improvement exercise.
I like Defect Resolution Efficiency metrics. DRE is ratio of defects resolved prior to software release against all defects found. I suggest tracking this metrics for each release of your software into production.
Tracking metrics in QA has been a fundamental activity for quite some time now. But often, development teams do not fully look at how relevant these metrics are in relation to all aspects of the business. For example, the typical tracked metrics such as defect ratios, validity, test productivity, code coverage etc. are usually evaluated in terms of the functional aspects of the software, but few pay attention to how they matter to the business aspects of software.
There are also other metrics that can add much value to the business aspects of the software, which is very important when an overall quality view of the software is looked at. These can be broadly classified into:
Needs of the beta users captured by business analysts, marketing and sales folks
End-user requirements defined by the product management team
Ensuring availability of the software at peak loads and ability of the software to integrate with enterprise IT systems
Support for high-volume transactions
Security aspects depending on the industry that the software serves
Availability of must-have and nice-to-have features in comparison to the competition
And a few more….
Code coverage percentage
If you're using Scrum, you want to know how each day's Scrum went. Are people getting done what they said they'd get done?
Personally, I'm bad at it. I chronically run over on my dailies.
Perhaps you can test CodeHealer
CodeHealer performs an in-depth analysis of source code, looking for problems in the following areas:
Audits Quality control rules such as unused or unreachable code,
use of directive names and
keywords as identifiers, identifiers
hiding others of the same name at a
higher scope, and more.
Checks Potential errors such as uninitialised or unreferenced
identifiers, dangerous type casting,
automatic type conversions, undefined
function return values, unused
assigned values, and more.
Metrics Quantification of code properties such as cyclomatic
complexity, coupling between objects
(Data Abstraction Coupling), comment
ratio, number of classes, lines of
code, and more.
Size and frequency of source control commits.

How to gauge the quality of a software product

I have a product, X, which we deliver to a client, C every month, including bugfixes, enhancements, new development etc.) Each month, I am asked to err "guarantee" the quality of the product.
For this we use a number of statistics garnered from the tests that we do, such as:
reopen rate (number of bugs reopened/number of corrected bugs tested)
new bug rate (number of new, including regressions, bugs found during testing/number of corrected bugs tested)
for each new enhancement, the new bug rate (the number of bugs found for this enhancement/number of mandays)
and various other figures.
It is impossible, for reasons we shan't go into, to test everything every time.
So, my question is:
How do I estimate the number and type of bugs that remain in my software?
What testing strategies do I have to follow to make sure that the product is good?
I know this is a bit of an open question, but hey, I also know that there are no simple solutions.
Thanks.
I don't think you can ever really estimate the number of bugs in your app. Unless you use a language and process that allows formal proofs, you can never really be sure. Your time is probably better spent setting up processes to minimize bugs than trying to estimate how many you have.
One of the most important things you can do is have a good QA team and good work item tracking. You may not be able to do full regression testing every time, but if you have a list of the changes you've made to the app since the last release, then your QA people (or person) can focus their testing on the parts of the app that are expected to be affected.
Another thing that would be helpful is unit tests. The more of your codebase you have covered the more confident you can be that changes in one area didn't inadvertently affected another area. I've found this quite useful, as sometimes I'll change something and forget that it would affect another part of the app, and the unit tests showed the problem right away. Passed unit tests won't guarantee that you haven't broken anything, but they can help increase confidence that changes you make are working.
Also, this is a bit redundant and obvious, but make sure you have good bug tracking software. :)
The question is who requires you to provide the stats.
If it's non-technical people, fake the stats. By "fake", I mean "provide any inevitably meaningless, but real numbers" of the kind you mentioned.
If it's technical people without a CS background, they ought to be told about the halting problem, which is undecidable and is simpler than counting and classifying the remaining bugs.
There's a lot of metrics and tools regarding software quality (code coverage, cyclomatic complexity, coding guidelines and tools enforcing them, etc.). In practice, what works is automating as much tests as possible, having human testers do as many tests that weren't automated as possible, and then pray.
I think keeping it simple is the best way to go. Categorize your bugs by severity, and address them in order of decreasing severity.
This way you can hand over the highest-quality build possible (the number of significant bugs remaining is how I would gauge the quality of the product, as opposed to some complex statistics).
Most of the agile methodologies address this dilemma pretty clearly. You can't test everything. Neither can you test it infinite number of times before you release. So the procedure is to rely on the risk and likelihood of the bug. Both risk and likelihood are numerical values. The product of both gives you a RPN number. If the number is less than 15 you ship a beta. If you can bring it down to less than 10 you ship the product and push the bug to be fixed in a future releasee.
How to calculate risk ?
If its a crash then its a 5
If its a crash but you can provide a work around then its a number less than 5.
If the bug reduces the functionality then its a 4
How to calculate likelihood ?
can you re-produce it every time you run, its a 5.
If the work around provided still causes it to crash then less than 5
Well, I am curious to know whether anyone else using this scheme and eager to know their milage on this.
How long is a piece of string? Ultimately what makes a quality product? Bugs gives some indication yes, but many other factors are involved, Unit Test coverage is a key factor in IMO. But in my experience the main factor that effects whether a product can be deemed quality or not, is good understanding of the problem that is being solved. Often what happens is, the 'problem' that the product is meant to solve is not understood correctly and developers end up inventing the solution to a problem they have flesh out in their head, and not the real problem, thus 'bugs' are made. I am a strong proponent of iterative Agile development, that way the product is constantly access against the 'problem' and the product does not stray to far from its goal.
The questions I heard wer, how do I estimate the bugs in my software? and what techniques do I use to ensure the quality is good?
Rather than go through a full course, here are a couple approaches.
How do I estimate the bugs in my software?
Start with the history, you know how many you found during testing (hopefully) and you know how many were found after the fact. You can use that to estimate how efficient you are at finding bugs (DDR - Defect Detection Rate is one name for this). If you can show that for some consistent time period, your DDR is consistent (or improving) you can provide some insight into the quality of the release by guessing at the number of post-release defects that will be found once the product is released.
What techniques do I use to ensure the quality is good?
Root cause analysis on your bugs will point you to specific components that are buggy, specific developers that create buggy code, the fact that lacking full requirements results in implementation not matching expectations, etc.
Project Review meetings to quickly identify what was good, so those things can be repeated and what was bad and find a way to not do those again.
Hopefully, these give you a good start. Good Luck!
It seems the consensus is that the emphasis should be placed on unit testing. Bug tracking is a good indicator of the product quality, but is only is acurate as your test team. If you employ unit testing it gives you a measurable metric of code coverage and provides regression testing so you can be assured you didn't break anything since last month.
My company relies on system/integration level testing. I see alot of defects being introduced because there is a lack of regression testing. I think "bugs" where the developer's implementation of the requirements deviates from the user's vision is sort of a seperate problem that as Dan and rptony stated is best addressed by Agile methodologies.