How to check whether the test case has covered all the requirements other than tracebaility matrix.
There is no tool for this, if this is what you are looking for. Having said that 100% test coverage of requirements in unattainable. Your coverage should always be driven by specific situation, including the desires of the stakeholders. I recommend you read about context driven testing.
You can use a mix of traceability matrix and code coverage to get a reasonable idea if you want.
Related
There are so many Hyperloglog implementation out there, but how do you verify / test Hyperloglog implementation? To check it's "accuracy", it's "error" bound behavior? Just throwing some static test cases looks very ineffective.
More concrete, someone changes the random number routine, how do I know that is not a disastrous choice and show with some automated, repeatable tests?
Can anyone point me to any known good tests in github or other place, and may be some explanations?
Good question. First, note that while HyperLogLog's theoretical foundation offers some indication of accuracy, it is critical to test the implementation you are using.
Testing should use random datasets (additional static datasets are also possible), and should be applied across varying set cardinalities. If you have any test automation framework in place, that would be a natural place to ensure avoiding regression, as you suggested above. However, note that to measure accuracy with large cardinalities, test runtime might be prohibitive.
You can use the implementation below for reference. It includes unit tests which draw large numbers of random numbers, and check the accuracy at fixed intervals.
https://github.com/Microsoft/CardinalityEstimation
What are the different metrics we use for assuring the quality of test suites written based on only requirements and specifications (black box)?
Simply put, given a set of requirements and a test suite on those requirements, what are different metrics to quantify the quality of specification/requirement based testing (test suite)?
I read through the following articles regarding specification based testing and metrics to define them. These topics are too abstract to digest.
http://link.springer.com/chapter/10.1007%2F978-3-642-21768-5_13#page-1
http://www.worldscientific.com/doi/abs/10.1142/S0218539301000530
Can you please explain in simple terms?
Thanks!
The simplest way to evaluate specification-based testing is to trace each specification to a test (whether manual or automated), count which specifications are tested and which are not, and calculate percent coverage.
The confusion related to the articles you linked to is due to confusion between "specification" used to refer to a human-written, structured but relatively informal document, and "specification" meaning a formal computer-readable specification from which tests can be automatically derived.
It's also possible to measure code coverage during specification-based testing. However, it's very difficult to improve coverage without looking inside the black box. Also, specification-based tests are slow, even when automated, so it's painful to achieve code coverage using only specification-based tests. A better approach is to combine black-box specification-based tests and white-box unit tests and consider overall coverage.
I am currently working with an existing implementation of Perlin noise, which came bundled with a bunch of code I am trying to clean up. The code in question is severely under-tested, and I would like to make sure that each of its components receives proper testing in case there are any hidden bugs.
However, I am not sure how I would go about testing the correctness of the Perlin noise implementation in this case. I welcome all suggestions.
This is a tough problem & probably doesn't have a single best solution.
For some images properties, you might be able to perform automated tests using Computer Vision techniques. I.E. if your Perlin noise output is supposed to be tile-able, an edge detection filter might be able to catch problems. I've also had some good results using FFT filters when I was working on an image classifier for perlin noise based wood grain textures. In my experience, implementing such tests can easily take more time then building the code being tested. To minimize that, I'd stick with libraries like OpenCV, Octave, etc. Also, this approach depends on having known good output in order to build your tests.
From a certain perspective, Perlin noise is a type of random number generator. To that end, you might be able to use RNG test suites like the NIST Statistical Test Suite or the Diehard tests. This approach depends on having known good output in order to build your tests.
Finally, you could build tests that output the results to file & then perform a manual confirmation of each against expected results. For convenience, you could load collections of images via a web page & maybe even integrate reporting check boxes to collect pass/fail responses from your tester. This solution is the best I've come up with for testing properties that are difficult, impossible or impracticable to quantify. I.E. I only know my particle effect is correct when I see it.
If someone has a webpage, the usual way of testing the web site for user interaction bugs is to create each test case by hand and use selenium.
Is there a tool to create these testcases automatically? So if I have a webpage that gets altered, new test cases get created automatically?
You can look at a paid product. That type of technology is not being developed as open source and will probably cost a bit. Some of the major test tools get closer to this, but full auto I have not heard of.
If this was the case the role of QA Engineer and especially Automation Engineer would not be as important and the jobs would spike downwards pretty quickly. I would imagine that if such a tool was out there that it would be breaking news to the entire industry and be world wide.
If you go down the artificial intelligence path this is possible in theory and concept, however, usually artificial intelligence development efforts costs more than the app being developed that needs the testing, so...that's not going to happen.
The best to do at this point is separate out as much of the maintenance into a single section from the rest so you limit the maintenance headache when modyfying and keep a core that stays the same. I usually focus on control manipulation as generic and then workflow and specific maps and data change. That will allow it to function against any website...but you still have to write/update the tests and maintain the maps.
I think Growing Test Cases Automatically is more of what your asking. To be more specific I'll try to introduce basics and if you're interested take a closer look at Evolutionary Testing
Usually there is a standard set of constraints we meet like changing functionality of the system under test (SUT), limited timeframe, lack of appropriate test tools and the list goes on… Yet there is another type of challenge which arises as technological solutions progress further – increase of system complexity.
While the typical constraints are solvable through different technical and management approaches, in the case of system complexity we are facing the limit of our capability of defining a straight-forward analytical method for assessing and validating system behavior. Complex system consist of multiple, often heterogeneous components which when working together amplify each other’s statistical and behavioral deviations, resulting in a system which acts in ways that were not part of its initial design. To make matter worse, complex systems increase sensitivity to their environment as well with the help of the same mechanism.
Options for testing complex systems
How can we test a system which behaves differently each time we run a test scenario? How can we reproduce a problem which costs days and millions to recover from, but happens only from time to time under conditions which are known just approximately?
One possible solution which I want to focus on is to embrace our lack of knowledge and work with whatever we have by using evolutionary testing. In this context the evolutionary testing can be viewed as a variant of black-box testing, because we are working with feeding input into and evaluating output from a SUT without focusing on its internal structure. The fine line here is that we are organizing this process of automatic test case generation and execution on a massive scale as an iterative optimization process which mimics the natural evolution.
Evolutionary testing
Elements:
• Population – set of test case executions, participating into the optimization process
• Generation – the part of the Population, involved into given iteration
• Individual – single test case execution and its results, an element from the Population
• Genome – unified definition of all test cases, model describing the Population
• Genotype – a single test case instance, a model describing an Individual, instance of the Genome
• Recombination – transformation of one or more Genotypes into a new Genotype
• Mutation – random change in a Genotype
• Fitness Function – formalized criterion, expressing the suitability of the Individual against the goal of the optimization
How we create these elements?
• Definition of the experiment goal (selection criteria) – sets the direction of the optimization process and is related to the behavior of the SUT. Involves certain characteristics of SUT state or environment during the performed test case experiments. Examples:
o “SUT should complete the test case execution with an error code”
o “The test case should drive the SUT through the largest number of branches in SUT’s logical structure”
o “Ambient temperature in the room where SUT is situated should not exceed 40 ºC during test case execution”
o “CPU utilization on the system, where SUT runs should exceed 80% during test case execution”
Any measurable parameters of SUT and its environment could be used in a goal statement. Knowledge of the relation between the test input and the goal itself is not obligatory. This gives a possibility to cover goals which are derived directly from requirements, rather than based on some late requirement derivative like business, architectural or technical model.
• Definition of the relevant inputs and outputs of the tested system – identification of SUT inputs and outputs, as well as environment parameters, relevant to the experiment goal.
• Formal definition of the experiment genome – encoding the summarized set of test cases into a parameterized model (usually a data structure), expressing relevant SUT input data, environment parameters and action sequences. This definition also needs to comply with the two major operations applied over genome instances – recombination and mutation. The mechanism for those two operations can be predefined for the type of data or action present in the genome or have custom definitions
• Formal definition of the selection criteria (fitness function) – an evaluation mechanism which takes SUT output or environment parameters resulting from a test case execution (Individual) and calculates a number (Fitness), signifying how close is this particular Individual to the experiment goal.
How the process works?
We use the Genome to create a Generation of random Genotypes (test case instances).
We execute the test cases (Genotypes) generating results (Individuals)
We evaluate each execution result (Individual) against our goal using the Fitness Function
We select only those Individuals from given Generation which have Fitness above a given threshold (the top 10 %, above the average, etc.)
We use the selected individuals to produce a new, full Generation set by applying Recombination and Mutation
We repeat the process, returning on step 2
The iteration process usually stops by setting a condition with regard to the evaluated Fitness of a Generation. For example:
• If the top Fitness hasn’t changed with more than 0.1% since the last Iteration
• If the difference between the top and the bottom Fitness in a Generation is less than 0.3%
then probably it is time to stop.
Upsides and downsides
Upsides:
• We can work with limited knowledge for the SUT and goal-oriented test definitions
• We use a test case model (Genome) which allows us to mass-produce a large number of test cases (Genotypes) with little effort
• We can “seed” test cases (Genotypes) in the first iteration instead of generating them at random in order to speed up the optimization process.
• We could run test cases in parallel in order to speed up the process
• We could find multiple solutions which meet our test goal
• If the optimization process in convergent we have a guarantee that each following Generation is a better approximate solution of our test goal. This means that even if we need to stop before we have reached optimal Fitness we will still have better test cases than the one we started with.
• We can achieve replay of very complex, hard to reproduce test scenarios which mimic real life and which are far beyond the reach of any other automated or manual testing technique.
Downsides:
• The process of defining the necessary elements for evolutionary test implementation is non-trivial and requires specific knowledge.
• Implementing such automation approach is time- and resource-consuming and should be employed only when it is justifiable.
• The convergence of the optimization process depends on the smoothness of the Fitness Function. If its definition results in a zones of discontinuity or small/no gradient then we can expect slow or no convergence
Update:
I also recommend you to look at Genetic algorithms and this article about Test data generation can give you approaches and guidelines.
I happen to develop ecFeed - an open-source tool that may assist in test design. It's in pre-release phase and we are going to add better integration with Selenium, but you may have a look at the current snapshot: https://github.com/testify-no/ecFeed/wiki . The next version should arrive in October and will have major improvements in usability. Anyway, I am looking forward for constructive criticism.
In the Microsoft development world there is Visual Studio's Coded UI Test framework. This will record your actions in a web browser and generate test cases to replicate that use case. It won't update test cases with any changes to code though, you would need to update them manually or re-generate.
I was taught that a regression test was a small (only enough to prove you didn't break anything with the introduction of a change or new modules) sample of the overall tests. However, this article by Ron Morrison and Grady Booch makes me think differently:
The desired strategy would be to bring each unit in one at a time, perform an extensive regression test, correct any defects and then proceed to the next unit.
The same document also says:
As soon as a small number of units are added, a test version is generated and "smoke tested," wherein a small number of tests are run to gain confidence that the integrated product will function as expected. The intent is neither to thoroughly test the new unit(s) nor to completely regression test the overall system.
When describing smoke testing, the authors say this:
It is also important that the Smoke Test perform a quick check of the entire system, not just the new component(s).
I've never seen "extensive" and "regression test" used together nor a regression test described as "completely regression test the overall system". Regression tests are supposed to be as light and quick as possible. And the definition of smoke test is what I learned a regression test was.
Did I misunderstand what I was taught? Was I taught incorrectly? Or are there multiple interpretations of "regression test"?
There are multiple interpretations. If you're only fixing a bug that affects one small part of your system then regression tests might only include a small suite of tests that exercise the class or package in question. If you're fixing a bug or adding a feature that has wider scope then your regression tests should have wider scope as well.
The "if it could possibly break, test it" rule of thumb applies here. If a change in Foo could affect Bar, then run the regressions for both.
Regression tests just check to see if a change caused a previously passed test to fail. They can be run at any level (unit, integration, system). Reference.
I always took regression testing to mean any tests whose purpose was to ensure that existing functionality is not broken by new changes. That would not imply any constraint on the size of the test suite.
Regression is generally used to refer to the whole suite of tests. It is the last thing QA does before a release. It is used to show that everything that used to work still works, to the extent that that is possible to show. In my experience, it is generally a system-wide set of tests regardless of how small the change was (although small changes may not trigger a regression test).
Where I work, regression tests are standardized for each application at the end of each release. They are intended to test all functionality, but they are not designed to catch subtle bugs. So if you have a form that has various kinds of validation done on it, for example, a regression suite for that form would be to confirm that each type of validation gets done (field level and form level) and that correct information can be submitted. It is not designed to cover every single case (i.e. what if I leave field A blank? How about field B? it will just test one of them and assume the others work).
However, on the current project I'm working on, the regression tests are much more thorough, and we have noticed a reduction in the number of defects being raised during testing. Those two are not necessarily related, but we do notice it fairly consistently.
my understanding of the term 'regression testing' is:
unit tests are written to test features when the system is created
when bugs are discovered, more unit tests are written to reproduce the bug and verify that it has been corrected
a regression test runs the entire set of tests prove that everything still works including that no old bugs have reappeared [i.e. to prove that the code has not "regressed"]
in practice, it is best to always run all existing unit tests when changes are made. the only time i'd bother with a subset of tests is when the full unit test suite takes "too long" to run [where "too long" is fairly subjective]
Start with what you are trying to accomplish. Then do what you need to do to accomplish that goal. And then use buzzword bingo to assign a word to what you actually do. Just like everyone else :-) Accuracy isn't all that important.
... regression test was a small (only enough to prove you didn't break anything with the introduction of a change or new modules) sample of the overall tests
If a small sample of tests is enough to prove that the system works, why do the rest of the tests even exist? And if you think you know that your change only affected a subset of functionality, then why do you need to test anything after making the change? Humans are fallible, nobody really knows if changing something breaks something else. IMO, if your tests are automated, re-run them all. And if they aren't automated, automate them. In the mean time, re-run whatever is automated.
In general, a subset of the feature tests for the new feature introduced in version X of a product becomes the basis of the regression tests for version X+1, X+2, and so on. Over time, you may reduce the time taken by the feature/regression tests of stable features which have not suffered from regressions. If a feature suffers from lots of regressions, then it may be beneficial to increase the emphasis on the feature.
I think that the article referring to 'extensive regression test' means run an extensive set of (individually simple) regression tests.