How we can get time of individual test cases in DejaGnu - testing

I am running GCC testsuite and I want to know time elapsed for each individual test case. GCC uses DejaGnu for its test suite and I know that time can be used in scripts to get the time of a test case. I am wondering if there is any flag that I can pass with runtest that forces timing for all test cases (without changing test scripts).

I don't know of a generic way.
DejaGNU does not really have a built-in notion of the boundaries of a test. For example, it's reasonably common for a single conceptual test to call "pass" or "fail" several times. E.g., in GCC, a compilation test may check for several warnings from a given source file -- but each separate warning, and also the check for excess warnings, would be a separate pass or fail. However, these would all arise from a single invocation of GCC.
I think there are two approaches that you can take.
You can hack the .exp files you care about and use knowledge of what they are doing to track the times you are interested in.
You can run a single .exp file in isolation and time how long it takes. This is less useful in general, but it is what I did when making the GDB test suite more fully parallelizable.

Related

Running Google Test cases non parallel

Because of the resource exhaustion there is a need to run test cases serially, without threads (integration tests for CUDA code). Went through source code (e.g. tweaking GetThreadCount()) and trying to find other ways how to influence gmock/gtest framework to run tests serially but found no way out.
Apparently at first did not find any command line arguments that could influence it. Feel like the only way out is to create many binaries or create a script that utilizes --gtest-filter. I would not like to mess with secret synchronization primitives between test cases.

How to make CTest run few tests within one executable

We have over 10 Google Test executables which, in sum, provide over 5000 test cases.
So far, our CTest configuration used add_test command which resulted in each test executable being a single test from CTest point of view.
Recently, we used the GoogleTest module to replace the add_test command with the gtest_discover_tests command. This way all individual test cases are now visible to CTest. Now each of those over 5000 test cases is a separate test in CTest.
This worked quite nice enhancing parallel runs (a strong machine runs far more test cases than we have test executables) and allowing us to use CTest command-line interface to filter test cases etc. abstracting away the testing framework used.
However, we hit a major blocker for Valgrind runs! Now each individual test case is run separately, causing the Valgrind machinery to be set up and tear down over 5000 times. The time whooped from around 10 minutes for the full suite to almost 2 hours which is unacceptable.
Now, I'm wondering whether there is any way to make the CTest run tests in batches from the same executable by invoking the executable only once. We would do this for Valgrind runs but not the ordinary runs. I'm afraid there is no way, especially that it would probably require the GoogleTest module to somehow explain how to do it. But maybe someone already had a similar issue and solved it somehow?
I know a workaround would be to skip CTest for Valgrind runs. Just take the test executables and run them "manually" under Valgrind. Doable, probably also in an automated way (so the list of test executables is somehow "queried", perhaps with the --show-only argument to ctest, rather than hardcoded). But it makes the interface (command line, output, etc.) less consistent.

How to limit the number of test threads in Cargo.toml?

I have tests which share a common resource and can't be executed concurrently. These tests fail with cargo test, but work with RUST_TEST_THREADS=1 cargo test.
I can modify the tests to wait on a global mutex, but I don't want to clutter them if there is any simpler way to force cargo set this environment variable for me.
As of Rust 1.18, there is no such thing. In fact, there is not even a simpler option to disable parallel testing.
Source
However, what might help you is cargo test -- --test-threads=1, which is the recommended way of doing what you are doing over the RUST_TEST_THREADS envvar. Keep in mind that this only sets the number of threads used for testing in addition to the main thread.

Does there exist an established standard for testing command line arguments?

I am developing a command line utility that has a LOT of flags. A typical command looks like this:
mycommand --foo=A --bar=B --jar=C --gnar=D --binks=E
In most cases, a 'success' message is printed but I still want to verify against other sources like an external database to ensure actual success.
I'm starting to create integration tests and I am unsure of the best way to do this. My main concerns are:
There are many many flag combinations, how do I know which combinations to test? If you do the math for the 10+ flags that can be used together...
Is it necessary to test permutations of flags?
How to build a framework capable of automating the tests and then verifying results.
How to keep track of a large number of flags and providing an order so it is easy to tell what combinations have been implemented and what has not.
The thought of manually writing out individual cases and verifying results in a unit-test like format is daunting.
Does anyone know of a pattern that can be used to automate this type of test? Perhaps even software that attempts to solve this problem? How did people working on GNU commandline tools test their software?
I think this is very specific to your application.
First, how do you determine the success of the execution of you application? Is it a result code? Is it something printed to the console?
For question 2, it depends how you parse those flags in your application. Most of the time, order of flags isn't important, but there are cases where it is. I hope you don't need to test for permutations of flags, because it would add a lot of cases to test.
In a general case, you should analyse what is the impact of each flag. It is possible that a flag doesn't interfere with the others, and then it just need to be tested once. This is also the case for flags that are meant to be used alone (--help or --version, for example). You also need to analyse what values you should test for each flag. Usually, you want to try each kind of possible valid value, and each kind of possible invalid values.
I think a simple bash script could be written to perform the tests, or any scripting language, like Python. Using nested loops, you could try, for each flag, possibles values, including tests for invalid values and the case where the flag isn't set. I will produce a multidimensional matrix of results, that should be analysed to see if results are conform to what expected.
When I write apps (in scripting languages), I have a function that parses a command line string. I source the file that I'm developing and unit test that function directly rather than involving the shell.

How to protect yourself when refactoring non-regression tests?

Are there specific techniques to consider when refactoring the non-regression tests? The code is usually pretty simple, but it's obviously not included into the safety net of a test suite...
When building a non-regression test, I first ensure that it really exhibits the issue that I want to correct, of course. But if I come back later to this test because I want to refactor it (e.g. I just added another very similar test), I usually can't put the code-under-test back in a state where it was exhibiting the first issue. So I can't be sure that the test, once refactored, is still exercising the same paths in the code.
Are there specific techniques to deal with this issue, except being extra careful?
It's not a big problem. The tests test the code, and the code tests the tests. Although it's possible to make a clumsy mistake that causes the test to start passing under all circumstances, it's not likely. You'll be running the tests again and again, so the tests and the code they test gets a lot of exercise, and when things change for the worse, tests generally start failing.
Of course, be careful; of course, run the tests immediately before and after refactoring. If you're uncomfortable about your refactoring, do it in a way that allows you to see the test working (passing and failing). Find a reliable way to fail each test before the refactoring, and write it down. Get to green - all tests passing - then refactor the test. Run the tests; still green? Good. (If not, of course, get green, perhaps by starting over). Perform the changes that made the original unrefactored tests fail. Red? Same failure as before? Then reinstate the working code, and check for green again. Check it in and move onto your next task.
Try to include not only positive cases in your automated test, but also negative cases (and a proper handler for them).
Also, you can try to run your refactored automated test with breakpoints and supervise through the debugger that it keeps on exercising all the paths you intended it to exercise.