Test framework for black box regression testing - testing

I am looking for a tool for regression testing a suite of equipment we are building.
The current concept is that you create an input file (text/csv) to the tool specifying inputs to the system under test. The tool then captures the outputs from the system and records the inputs and outputs to an output file.
The output is in the same format as the original input file and can be used as an input for following runs of the tool, with the measured outputs matched with the values from the previous run.
The results of two runs will not be exact matches, there are some timing differences that depend on the state of the battery, or which depend on other internal state of the equipment.
We would have to write our own interfaces to pass the commands from the tool to the equipment and to capture the output of the equipment.
This is a relatively simple task, but I am looking for an existing tool / package / library to avoid re-inventing the wheel / steal lessons from.

I recently built a system like this on top of git (http://git.or.cz/). Basically, write a program that takes all your input files, sends them to the server, reads the output back, and writes it to a set of output files. After the first run, commit the output files to git.
For future runs, your success is determined by whether the git repository is clean after the run finishes:
test 0 == $(git diff data/output/ | wc -l)
As a bonus, you can use all the git tools to compare differences, and commit them if it turns out the differences were an improvement, so that future runs will pass. It also works great when merging between branches.

I'm not sure there will be a single package that exactly suits your needs. You have a few considerations to make:
How to pass data to the equipment and how to collect it back. This is very application specific, but a usually good option is the old'n'good serial port (RS232) for which an easy interfact exists for any programming language.
How to run the tests. A unit-testing framework can definitely help you here. The existing frameworks have a lot of the basic features implemented - selecting tests to run, selecting the detail-level of the report (very important for detailed debugging at first and production-stage PASS/FAIL analysis later on). I've had good experience using the test frameworks of both Perl and Python from testing embedded devices.
You also have to decide how to make the comparisons. As you correctly noted, the results won't be equal. This is where your domain knowledge comes in. Usually, it is simply implemented using error margins that are applicable in your domain. Of course, you won't be able to use a basic diff tool and will have to write an intelligent script.

You can just use any test framework. The hard part is writing the tools to send/retrieve the data from your test system, not the actual string comparisons.
Your tests would just all look like this:
x = read_input_file(ifilename);
y1 = read_expected_data(ofilename);
send_input_file_to_server();
y2 = read_output_from_server();
checkequal(y1, y2)

Related

Handling Expected Changes in Regression Tests

I am working on using continuous deployment for my service which generates XML files as output. To achieve this, we are planning to add Regression Tests to our deployment flow, where we compare the XML file generated with this code change v/s the one without this code change.
But since some code changes might lead to differences between the output, leading to the test failure.
One approach could be to allow the tests to fail and generate a Diff report which would then be manually approved.
How are such cases handled generally in continuous deployment?
You could use something like this xmldiff tool, which creates human-readable diffs between XML files. If a code change was made that causes a test failure, the diff report would already be generated for you.
I've used similar utilities for screenshot comparison, and although they still require manual review in the end when there are unexpected changes, it speeds up the process quite a bit.

How Test scripts run for a specific project after completing all test scripts?

I am working in automation in selenium with java and testng.I have completed all my test scripts but i don't have practical experience of working in Selenium in IT industry.
My question is how the test scripts run after completing test scripts for a specific project for regression?
1.Using Eclipse(any IDE) on regular basis or
2.Making any jar file to run on regular basis or
3.Any other means
Please let me know what happens according to company point of view.
It really depends on companies point of view. From my work practice - we had been doing regression via selenium (Eclipse IDE), but again if the company practices continuous integration system, mostly the test must be ran by a machine, so probably then it's better to use jar which would return test results to some kind of file.
Different companies follow different approach while doing regression testing, when using automated scripts. There is not set standards or a SOP (Standard Operating Procedure) for this. The higher level of hierarchy in the testing department (test lead etc.) has to decide, which practise will have a maximum ROI while following a specific practise.
For example, in my current organisation, I have been running a CI system, using Jenkins, which runs all the automation scripts, at a specified time on the day the regression is supposed to begin - or we trigger it and the CI takes care of the rest.
In my previous organisation, for regression purposes, we have had a dedicated system, where we would copy all the scripts, make all the necessary updates/system changes and then trigger the scripts to have the tests run.
I believe not many big companies would follow the practise of having their regression tests run individually via Eclipse IDE, since for a whole sprint (or for a whole project), there would be hundreds of test cases, involving a lot of scripts, and running them via Eclipse, would make them pretty boring and time consuming as well. Plus every single test script run would generate a separate report, which would be too complex to store and debug in case of any failure.
However, as I said, this depends entirely on how the company sees the ROI and effort to be made for this.

Aggregating code coverage from different testing frameworks

In modern programming workflow numerous testing frameworks are used at once. For example, in PHP world, it is de-facto standard way to use unit tests, integration tests and functional/acceptance tests at once. Most of the time different frameworks are used for different test type. I am using combination of PHPSpec for unit, PHPunit for integration and CodeCeption for functional tests.
Is it possible to aggregate code coverage results that each of these frameworks return? Is there any tool that aggregates code coverage reports from different frameworks?
Or it is only possible to view individual results for each framework while they are incorrect because each code coverage report doesn't take into account other tests.
It is actually quite simple to perfrom this task. All your frameworks rely on the same library to generate the code coverage.
As you can see the generator in sebastianbergmann/php-code-coverage already supports a merge function (line 335) to merge different aggregates. Since you are part of a team using tests I assume it will be easy for you to change the test execution layer slightly to gather the code coverage in a single php process and just merge em.
There is a tool for this: phpcov. It allows to merge many coverage files with merge option:
$ parallel --gnu ::: \
'phpunit --coverage-php /tmp/coverage/FooTest.cov tests/FooTest' \
'phpunit --coverage-php /tmp/coverage/BarTest.cov tests/BarTest'
$ phpcov merge /tmp/coverage --clover /tmp/clover.xml
phpcov 2.0.0 by Sebastian Bergmann.
Generating code coverage report in Clover XML format ... done
I think we are on same boat. How we can tell how much we have converge using this all different testing tool. We start discuss with team and decide to go for
SonarSource. - For PHP Plugin and Live demo
PHP Report Stlyle - I advice you to visit live demo. It will help more.
It is very robust tool. It give us all inside of code.
The PHP Test Coverage Tool from Semantic Designs (my company) collects and combines test coverage from any
framework
test set
individual test
even ad hoc manual tests.
After running some set of tests, our tool is can be easily triggered to dump test coverage vectors to a file; you need to modify the framework slightly to invoke
TCVDump();
when the framework completes, or you can invoke a TCVDDump() by touching an easily found, special web page added by the test coverage tool. Each such call produces a time-stamped or user-named file (e.g, after the framework or test set) so they are easily distinguished
The graphical test coverage display included as part of the tool will interactively select and merge small or large sets of such files to produce a coherent whole, both display and summary. It will also compare test coverage vectors to enable one to decide if coverage from one test set include/intersects another, etc.
The test coverage display component will also export text or XML/HTML summaries of the coverage results.
You can even run tests on different subsystems and combine them. This test coverage tool is part of larger family of tools for many languages other than PHP; tests run on a multilingual application system can also be combined to provide an overview of coverage for the multilingual application.

Does there exist an established standard for testing command line arguments?

I am developing a command line utility that has a LOT of flags. A typical command looks like this:
mycommand --foo=A --bar=B --jar=C --gnar=D --binks=E
In most cases, a 'success' message is printed but I still want to verify against other sources like an external database to ensure actual success.
I'm starting to create integration tests and I am unsure of the best way to do this. My main concerns are:
There are many many flag combinations, how do I know which combinations to test? If you do the math for the 10+ flags that can be used together...
Is it necessary to test permutations of flags?
How to build a framework capable of automating the tests and then verifying results.
How to keep track of a large number of flags and providing an order so it is easy to tell what combinations have been implemented and what has not.
The thought of manually writing out individual cases and verifying results in a unit-test like format is daunting.
Does anyone know of a pattern that can be used to automate this type of test? Perhaps even software that attempts to solve this problem? How did people working on GNU commandline tools test their software?
I think this is very specific to your application.
First, how do you determine the success of the execution of you application? Is it a result code? Is it something printed to the console?
For question 2, it depends how you parse those flags in your application. Most of the time, order of flags isn't important, but there are cases where it is. I hope you don't need to test for permutations of flags, because it would add a lot of cases to test.
In a general case, you should analyse what is the impact of each flag. It is possible that a flag doesn't interfere with the others, and then it just need to be tested once. This is also the case for flags that are meant to be used alone (--help or --version, for example). You also need to analyse what values you should test for each flag. Usually, you want to try each kind of possible valid value, and each kind of possible invalid values.
I think a simple bash script could be written to perform the tests, or any scripting language, like Python. Using nested loops, you could try, for each flag, possibles values, including tests for invalid values and the case where the flag isn't set. I will produce a multidimensional matrix of results, that should be analysed to see if results are conform to what expected.
When I write apps (in scripting languages), I have a function that parses a command line string. I source the file that I'm developing and unit test that function directly rather than involving the shell.

How would one go about testing an interpreter or a compiler?

I've been experimenting with creating an interpreter for Brainfuck, and while quite simple to make and get up and running, part of me wants to be able to run tests against it. I can't seem to fathom how many tests one might have to write to test all the possible instruction combinations to ensure that the implementation is proper.
Obviously, with Brainfuck, the instruction set is small, but I can't help but think that as more instructions are added, your test code would grow exponentially. More so than your typical tests at any rate.
Now, I'm about as newbie as you can get in terms of writing compilers and interpreters, so my assumptions could very well be way off base.
Basically, where do you even begin with testing on something like this?
Testing a compiler is a little different from testing some other kinds of apps, because it's OK for the compiler to produce different assembly-code versions of a program as long as they all do the right thing. However, if you're just testing an interpreter, it's pretty much the same as any other text-based application. Here is a Unix-centric view:
You will want to build up a regression test suite. Each test should have
Source code you will interpret, say test001.bf
Standard input to the program you will interpret, say test001.0
What you expect the interpreter to produce on standard output, say test001.1
What you expect the interpreter to produce on standard error, say test001.2 (you care about standard error because you want to test your interpreter's error messages)
You will need a "run test" script that does something like the following
function fail {
echo "Unexpected differences on $1:"
diff $2 $3
exit 1
}
for testname
do
tmp1=$(tempfile)
tmp2=$(tempfile)
brainfuck $testname.bf < $testname.0 > $tmp1 2> $tmp2
[ cmp -s $testname.1 $tmp1 ] || fail "stdout" $testname.1 $tmp1
[ cmp -s $testname.2 $tmp2 ] || fail "stderr" $testname.2 $tmp2
done
You will find it helpful to have a "create test" script that does something like
brainfuck $testname.bf < $testname.0 > $testname.1 2> $testname.2
You run this only when you're totally confident that the interpreter works for that case.
You keep your test suite under source control.
It's convenient to embellish your test script so you can leave out files that are expected to be empty.
Any time anything changes, you re-run all the tests. You probably also re-run them all nightly via a cron job.
Finally, you want to add enough tests to get good test coverage of your compiler's source code. The quality of coverage tools varies widely, but GNU Gcov is an adequate coverage tool.
Good luck with your interpreter! If you want to see a lovingly crafted but not very well documented testing infrastructure, go look at the test2 directory for the Quick C-- compiler.
I don't think there's anything 'special' about testing a compiler; in a sense it's almost easier than testing some programs, since a compiler has such a basic high-level summary - you hand in source, it gives you back (possibly) compiled code and (possibly) a set of diagnostic messages.
Like any complex software entity, there will be many code paths, but since it's all very data-oriented (text in, text and bytes out) it's straightforward to author tests.
I’ve written an article on compiler testing, the original conclusion of which (slightly toned down for publication) was: It’s morally wrong to reinvent the wheel. Unless you already know all about the preexisting solutions and have a very good reason for ignoring them, you should start by looking at the tools that already exist. The easiest place to start is Gnu C Torture, but bear in mind that it’s based on Deja Gnu, which has, shall we say, issues. (It took me six attempts even to get the maintainer to allow a critical bug report about the Hello World example onto the mailing list.)
I’ll immodestly suggest that you look at the following as a starting place for tools to investigate:
Software: Practice and Experience April 2007. (Payware, not available to the general public---free preprint at http://pobox.com/~flash/Practical_Testing_of_C99.pdf.
http://en.wikipedia.org/wiki/Compiler_correctness#Testing (Largely written by me.)
Compiler testing bibliography (Please let me know of any updates I’ve missed.)
In the case of brainfuck, I think testing it should be done with brainfuck scripts. I would test the following, though:
1: Are all the cells initialized to 0
2: What happens when you decrement the data pointer when it's currently pointing to the first cell? Does it wrap? Does it point to invalid memory?
3: What happens when you increment the data pointer when it's pointing at the last cell? Does it wrap? Does it point to invalid memory
4: Does output function correctly
5: Does input function correctly
6: Does the [ ] stuff work correctly
7: What happens when you increment a byte more than 255 times, does it wrap to 0 properly, or is it incorrectly treated as an integer or other value.
More tests are possible too, but this is probably where i'd start. I wrote a BF compiler a few years ago, and that had a few extra tests. Particularly I tested the [ ] stuff heavily, by having a lot of code inside the block, since an early version of my code generator had issues there (on x86 using a jxx I had issues when the block produced more than 128 bytes or so of code, resulting in invalid x86 asm).
You can test with some already written apps.
The secret is to:
Separate the concerns
Observe the law of Demeter
Inject your dependencies
Well, software that is hard to test is a sign that the developer wrote it like it's 1985. Sorry to say that, but utilizing the three principles I presented here, even line numbered BASIC would be unit testable (it IS possible to inject dependencies into BASIC, because you can do "goto variable".