I'm making my own version of Sudoku Solver. Are there any open source test cases that I can use to test the efficiency of my algorithm (I don't just want randomized test cases. I want test cases that are marked easy, medium, hard).
Thank you very much!
You can find some large datasets for Sudoku benchmarking and testing in this project: https://github.com/t-dillon/tdoku
See data.zip for the puzzles.
See https://github.com/t-dillon/tdoku/blob/master/benchmarks/README.md for descriptions of the datasets, their sources, and their difficulties.
Related
I am looking for a Test suite optimization using a Genetic Algorithm(GA). There are multiple IEEE research papers suggesting this, but I need help on implementing the same (using python).
These are my GA steps:
Representation: How to represent test case features in GA?
Initialize and evaluate (Fitness Function): How to create a Fitness function - formula?
Parent selection, Crossover and Mutation: Running though the GA
Output: Inferring from the output
I've done a lot of research on this, but I could not get on to the phase where I can implement.
For test case optimization (selecting the best test case) I use GA python packages.
Any help is much appreciated.
that's what I've been looking for as well.
Representation is a fist issue, but we can give ids for each step in the test case.
The fitness function could be a grade of how much that step covers, and the time spent to execute it.
Selection can be random, or we can specify pre/post step to use in a test case
(Each step can have a pre/post step).
Then, we can generate lots of test cases containing steps.
And get the best ones in each generation based on that FF.
Still working on some coding. We should be in contact.
There are so many Hyperloglog implementation out there, but how do you verify / test Hyperloglog implementation? To check it's "accuracy", it's "error" bound behavior? Just throwing some static test cases looks very ineffective.
More concrete, someone changes the random number routine, how do I know that is not a disastrous choice and show with some automated, repeatable tests?
Can anyone point me to any known good tests in github or other place, and may be some explanations?
Good question. First, note that while HyperLogLog's theoretical foundation offers some indication of accuracy, it is critical to test the implementation you are using.
Testing should use random datasets (additional static datasets are also possible), and should be applied across varying set cardinalities. If you have any test automation framework in place, that would be a natural place to ensure avoiding regression, as you suggested above. However, note that to measure accuracy with large cardinalities, test runtime might be prohibitive.
You can use the implementation below for reference. It includes unit tests which draw large numbers of random numbers, and check the accuracy at fixed intervals.
https://github.com/Microsoft/CardinalityEstimation
What are the different metrics we use for assuring the quality of test suites written based on only requirements and specifications (black box)?
Simply put, given a set of requirements and a test suite on those requirements, what are different metrics to quantify the quality of specification/requirement based testing (test suite)?
I read through the following articles regarding specification based testing and metrics to define them. These topics are too abstract to digest.
http://link.springer.com/chapter/10.1007%2F978-3-642-21768-5_13#page-1
http://www.worldscientific.com/doi/abs/10.1142/S0218539301000530
Can you please explain in simple terms?
Thanks!
The simplest way to evaluate specification-based testing is to trace each specification to a test (whether manual or automated), count which specifications are tested and which are not, and calculate percent coverage.
The confusion related to the articles you linked to is due to confusion between "specification" used to refer to a human-written, structured but relatively informal document, and "specification" meaning a formal computer-readable specification from which tests can be automatically derived.
It's also possible to measure code coverage during specification-based testing. However, it's very difficult to improve coverage without looking inside the black box. Also, specification-based tests are slow, even when automated, so it's painful to achieve code coverage using only specification-based tests. A better approach is to combine black-box specification-based tests and white-box unit tests and consider overall coverage.
I am currently working with an existing implementation of Perlin noise, which came bundled with a bunch of code I am trying to clean up. The code in question is severely under-tested, and I would like to make sure that each of its components receives proper testing in case there are any hidden bugs.
However, I am not sure how I would go about testing the correctness of the Perlin noise implementation in this case. I welcome all suggestions.
This is a tough problem & probably doesn't have a single best solution.
For some images properties, you might be able to perform automated tests using Computer Vision techniques. I.E. if your Perlin noise output is supposed to be tile-able, an edge detection filter might be able to catch problems. I've also had some good results using FFT filters when I was working on an image classifier for perlin noise based wood grain textures. In my experience, implementing such tests can easily take more time then building the code being tested. To minimize that, I'd stick with libraries like OpenCV, Octave, etc. Also, this approach depends on having known good output in order to build your tests.
From a certain perspective, Perlin noise is a type of random number generator. To that end, you might be able to use RNG test suites like the NIST Statistical Test Suite or the Diehard tests. This approach depends on having known good output in order to build your tests.
Finally, you could build tests that output the results to file & then perform a manual confirmation of each against expected results. For convenience, you could load collections of images via a web page & maybe even integrate reporting check boxes to collect pass/fail responses from your tester. This solution is the best I've come up with for testing properties that are difficult, impossible or impracticable to quantify. I.E. I only know my particle effect is correct when I see it.
I was taught that a regression test was a small (only enough to prove you didn't break anything with the introduction of a change or new modules) sample of the overall tests. However, this article by Ron Morrison and Grady Booch makes me think differently:
The desired strategy would be to bring each unit in one at a time, perform an extensive regression test, correct any defects and then proceed to the next unit.
The same document also says:
As soon as a small number of units are added, a test version is generated and "smoke tested," wherein a small number of tests are run to gain confidence that the integrated product will function as expected. The intent is neither to thoroughly test the new unit(s) nor to completely regression test the overall system.
When describing smoke testing, the authors say this:
It is also important that the Smoke Test perform a quick check of the entire system, not just the new component(s).
I've never seen "extensive" and "regression test" used together nor a regression test described as "completely regression test the overall system". Regression tests are supposed to be as light and quick as possible. And the definition of smoke test is what I learned a regression test was.
Did I misunderstand what I was taught? Was I taught incorrectly? Or are there multiple interpretations of "regression test"?
There are multiple interpretations. If you're only fixing a bug that affects one small part of your system then regression tests might only include a small suite of tests that exercise the class or package in question. If you're fixing a bug or adding a feature that has wider scope then your regression tests should have wider scope as well.
The "if it could possibly break, test it" rule of thumb applies here. If a change in Foo could affect Bar, then run the regressions for both.
Regression tests just check to see if a change caused a previously passed test to fail. They can be run at any level (unit, integration, system). Reference.
I always took regression testing to mean any tests whose purpose was to ensure that existing functionality is not broken by new changes. That would not imply any constraint on the size of the test suite.
Regression is generally used to refer to the whole suite of tests. It is the last thing QA does before a release. It is used to show that everything that used to work still works, to the extent that that is possible to show. In my experience, it is generally a system-wide set of tests regardless of how small the change was (although small changes may not trigger a regression test).
Where I work, regression tests are standardized for each application at the end of each release. They are intended to test all functionality, but they are not designed to catch subtle bugs. So if you have a form that has various kinds of validation done on it, for example, a regression suite for that form would be to confirm that each type of validation gets done (field level and form level) and that correct information can be submitted. It is not designed to cover every single case (i.e. what if I leave field A blank? How about field B? it will just test one of them and assume the others work).
However, on the current project I'm working on, the regression tests are much more thorough, and we have noticed a reduction in the number of defects being raised during testing. Those two are not necessarily related, but we do notice it fairly consistently.
my understanding of the term 'regression testing' is:
unit tests are written to test features when the system is created
when bugs are discovered, more unit tests are written to reproduce the bug and verify that it has been corrected
a regression test runs the entire set of tests prove that everything still works including that no old bugs have reappeared [i.e. to prove that the code has not "regressed"]
in practice, it is best to always run all existing unit tests when changes are made. the only time i'd bother with a subset of tests is when the full unit test suite takes "too long" to run [where "too long" is fairly subjective]
Start with what you are trying to accomplish. Then do what you need to do to accomplish that goal. And then use buzzword bingo to assign a word to what you actually do. Just like everyone else :-) Accuracy isn't all that important.
... regression test was a small (only enough to prove you didn't break anything with the introduction of a change or new modules) sample of the overall tests
If a small sample of tests is enough to prove that the system works, why do the rest of the tests even exist? And if you think you know that your change only affected a subset of functionality, then why do you need to test anything after making the change? Humans are fallible, nobody really knows if changing something breaks something else. IMO, if your tests are automated, re-run them all. And if they aren't automated, automate them. In the mean time, re-run whatever is automated.
In general, a subset of the feature tests for the new feature introduced in version X of a product becomes the basis of the regression tests for version X+1, X+2, and so on. Over time, you may reduce the time taken by the feature/regression tests of stable features which have not suffered from regressions. If a feature suffers from lots of regressions, then it may be beneficial to increase the emphasis on the feature.
I think that the article referring to 'extensive regression test' means run an extensive set of (individually simple) regression tests.