Testing tool: is there alternative than expect? - testing

Our current testing is a simple in-house tool which runs the named target, logs its output and compares it with a expected output. The expected output and real output are both text files.
This has an obvious downside, if some changes of line number change which has no function effect is regarded as failure.
We are thinking to use tools like expect to do this, but we like to know if there are some other alternatives. Googling does not return any immediate answers.
Our platform is linux; the users of this tool does not need to write testing code, basically they currently just provide a plain text file of expected result and we should not ask them to write code or some complex form.
Thanks for your inputs.

Related

In Go when using the Example... testing method is there a way to have it show a diff instead of got... want...?

I've been using go for a bigger project and love it, and for my testing i've been using the
func ExampleXxx {
... code ...
//Output:
//...expected output ...
}
method for testing. When it fails it will say
got:
... bunch of lines showing the output of test ...
want:
... the comment you put in to show what you expected ...
is there any way to make it show just the difference? I can take the two and copy to separate files and run a diff etc, but I'd much rather just have it show the parts that were wrong as some of my tests have longer output.
Thanks in advance
EDIT:
I'm using http://golang.org/pkg/testing/#hdr-Examples and want the output to show a diff not the current output. I know I can do the diff manually.
No, you cannot do this. This is not the intended use of Examples.
Examples are a nice way to show how some function will behave: Examples exists to document. The main reason for validating the example output is to make sure the Examples are valid/correct, not that your code is okay. For the later you have Test functions
Most often the output of an Example displays input and output (or just output) of one invocation of a certain function/method per line; sometimes Examples use the different lines to show parts of a complex result, e.g. one line per element of the returned slice.
I think your use of Examples to "verify the flow of my program" contradicts the intention of Examples. I would use Test functions and use any of the available diff tools to generate a got, want, diff output myself if I'd like to test e.g. a text processor on large bunches of input.
If I understand your question correctly, it sounds like GoConvey would do the trick... It's a TDD tool that runs in the browser, and it will show you colored diffs for most failures:
You can use it with your existing tests; if you don't want to convert to the GoConvey DSL, that's okay. You don't have to be using a TDD workflow per-se in order for it to work, but if you can run go test, this tool should be able to pick it up. (I've never used the Example functions for testing... I'm not sure what you mean by that, honestly.)
(There's a new web UI in the works that will still show the diff. See below.)
These are contrived examples, but obviously the diff is more useful with longer output.
Is that kind of what you're looking for?
From the style guide: https://code.google.com/p/go-wiki/wiki/Style#Useful_Test_Failures
if got != tt.want {
t.Errorf("Foo(%q) = %d; want %d", tt.in, got, tt.want) // or Fatalf, if test can't test anything more past this point
}
This will put out only errors of course. There is nothing built into go that lets you show the diff. I think just piping this to whatever diff tool you are using against your last output would still be best.
Go is great, but no reason to re-invent tools that already do fantastic jobs that already exists at the terminal.

Does there exist an established standard for testing command line arguments?

I am developing a command line utility that has a LOT of flags. A typical command looks like this:
mycommand --foo=A --bar=B --jar=C --gnar=D --binks=E
In most cases, a 'success' message is printed but I still want to verify against other sources like an external database to ensure actual success.
I'm starting to create integration tests and I am unsure of the best way to do this. My main concerns are:
There are many many flag combinations, how do I know which combinations to test? If you do the math for the 10+ flags that can be used together...
Is it necessary to test permutations of flags?
How to build a framework capable of automating the tests and then verifying results.
How to keep track of a large number of flags and providing an order so it is easy to tell what combinations have been implemented and what has not.
The thought of manually writing out individual cases and verifying results in a unit-test like format is daunting.
Does anyone know of a pattern that can be used to automate this type of test? Perhaps even software that attempts to solve this problem? How did people working on GNU commandline tools test their software?
I think this is very specific to your application.
First, how do you determine the success of the execution of you application? Is it a result code? Is it something printed to the console?
For question 2, it depends how you parse those flags in your application. Most of the time, order of flags isn't important, but there are cases where it is. I hope you don't need to test for permutations of flags, because it would add a lot of cases to test.
In a general case, you should analyse what is the impact of each flag. It is possible that a flag doesn't interfere with the others, and then it just need to be tested once. This is also the case for flags that are meant to be used alone (--help or --version, for example). You also need to analyse what values you should test for each flag. Usually, you want to try each kind of possible valid value, and each kind of possible invalid values.
I think a simple bash script could be written to perform the tests, or any scripting language, like Python. Using nested loops, you could try, for each flag, possibles values, including tests for invalid values and the case where the flag isn't set. I will produce a multidimensional matrix of results, that should be analysed to see if results are conform to what expected.
When I write apps (in scripting languages), I have a function that parses a command line string. I source the file that I'm developing and unit test that function directly rather than involving the shell.

Print complete control flow through gdb including values of variables

The idea is that given a specific input to the program, somehow I want to automatically step-in through the complete program and dump its control flow along with all the data being used like classes and their variables. Is their a straightforward way to do this? Or can this be done by some scripting over gdb or does it require modification in gdb?
Ok the reason for this question is because of an idea regarding a debugging tool. What it does is this. Given two different inputs to a program, one causing an incorrect output and the other a correct one, it will tell what part of the control flow differ for them.
So What I think will be needed is a complete dump of these 2 control flows going into a diff engine. And if the two inputs are following similar control flows then their diff would (in many cases) give a good idea about why the bug exist.
This can be made into a very engaging tool with many features build on top of this.
Tell us a little more about the environment. dtrace, for example, will do a marvelous job of this in Solaris or Leopard. gprof is another possibility.
A bumpo version of this could be done with yes(1), or expect(1).
If you want to get fancy, GDB can be scripted with Python in some versions.
What you are describing sounds a bit like gdb's "tracepoint debugging".
See gdb's internal help "help tracepoint". You can also see a whitepaper
here: http://sourceware.org/gdb/talks/esc-west-1999/
Unfortunately, this functionality is not currently implemented for
native debugging, but I believe that CodeSourcery is doing some work
on it.
Check this out, unlike Coverity, Fenris is free and widly used..
How to print the next N executed lines automatically in GDB?

Best way to test command line tools? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We have a large collection of command-line utilities that we write ourselves and use frequently. At the moment, testing them is very cumbersome and consequently, we don't do as much testing as we aught to.
I am wondering if anyone can suggest good techniques or tools for doing a good job of this kind of thing.
This is UNIX.
I recommend structuring your command line tool's code so that the command line utility is a client to a library of functions and/or classes.
Rather than simply using std::cout to print output, have the libraries function take an ostream reference that defaults to std::cout. When you are testing, provide a std::stringstream to collect the output.
Finally, simply compare your utility's output with expected results using your favorite unit testing framework.
(I apologize for the C++ specific example... I'm sure there are ways to do similar things in other languages too).
You can write tests that resemble an interactive shell session using Cram. It has flexible test specification format that allows you to match output using Perl regex or shell-like wildcards. Cram will replay commands from the test, compare output to the reference, and report differences.
Aruba is a Cucumber extension for testing command line applications written in any programming language.
To use it, you will need ruby to run the tests, but the purpose of aruba is to provide a library of pre-defined step definitions so that you won't need to write any ruby code to make a workable test suite. (Though at some point you probably will want to write a bit of ruby to make a few custom steps.)
You can see a sophisticated example of a command line tool tested with aruba here: jingweno/gh
You should be able to call them from a shell script (batch file, on MS operating systems), redirect the output to a file, then scan the file programmatically to ensure that it has the correct output. I'm not aware of a testing framework that automates this for you, but it should be fairly straight forward to set it up yourself.
Bats (Bash Automated Testing System) by Sam Stephenson. It is tiny, written purely in shell and has a nice set of features.
Previously suggested Aruba looks interesting, but in some cases it might be quiet an overkill in terms of dependencies (ruby, cucumber)
I did a little bit of this (a loooong time ago hehe) using Expect to check that what happened was what I, umm, expected
I have developed a tool "Exactly"
https://github.com/emilkarlen/exactly
It executes the thing to test in a temporary sandbox directory.
The README contains a number of examples.
A test of a hypotethical program "classify-files-by-moving-to-appropriate-dir" can look like this:
[setup]
dir input
dir output/good
dir output/bad
file input/a.txt = <<EOF
GOOD contents
EOF
file input/b.txt = <<EOF
bad contents
EOF
[act]
classify-files-by-moving-to-appropriate-dir GOOD input/ output/
[assert]
dir-contents input empty
exists output/good/a.txt : type file
dir-contents output/good num-files == 1
exists output/bad/b.txt : type file
dir-contents output/bad num-files == 1
You can do this from a batch file oder windows scripting host.
But i promise to use a task scheduler like (http://www.splinterware.com/products/wincron.htm) or other free/professional software.
There you can easy copy/paste the commandline-parameters which you should vary on, when you wanna test your software for about many 100 times?!
You could use perl with Test::more library, which provides a great framework for testing CLIs.
Though primarily designed for unit testing, you could extend it to test user workflows.
Some of the methods:
# Various ways to say "ok"
ok($got eq $expected, $test_name);
is ($got, $expected, $test_name);
isnt($got, $expected, $test_name);
# Rather than print STDERR "# here's what went wrong\n"
diag("here's what went wrong");
like ($got, qr/expected/, $test_name);
unlike($got, qr/expected/, $test_name);
cmp_ok($got, '==', $expected, $test_name);
command-lineautomationtestingperl-testing

How would one go about testing an interpreter or a compiler?

I've been experimenting with creating an interpreter for Brainfuck, and while quite simple to make and get up and running, part of me wants to be able to run tests against it. I can't seem to fathom how many tests one might have to write to test all the possible instruction combinations to ensure that the implementation is proper.
Obviously, with Brainfuck, the instruction set is small, but I can't help but think that as more instructions are added, your test code would grow exponentially. More so than your typical tests at any rate.
Now, I'm about as newbie as you can get in terms of writing compilers and interpreters, so my assumptions could very well be way off base.
Basically, where do you even begin with testing on something like this?
Testing a compiler is a little different from testing some other kinds of apps, because it's OK for the compiler to produce different assembly-code versions of a program as long as they all do the right thing. However, if you're just testing an interpreter, it's pretty much the same as any other text-based application. Here is a Unix-centric view:
You will want to build up a regression test suite. Each test should have
Source code you will interpret, say test001.bf
Standard input to the program you will interpret, say test001.0
What you expect the interpreter to produce on standard output, say test001.1
What you expect the interpreter to produce on standard error, say test001.2 (you care about standard error because you want to test your interpreter's error messages)
You will need a "run test" script that does something like the following
function fail {
echo "Unexpected differences on $1:"
diff $2 $3
exit 1
}
for testname
do
tmp1=$(tempfile)
tmp2=$(tempfile)
brainfuck $testname.bf < $testname.0 > $tmp1 2> $tmp2
[ cmp -s $testname.1 $tmp1 ] || fail "stdout" $testname.1 $tmp1
[ cmp -s $testname.2 $tmp2 ] || fail "stderr" $testname.2 $tmp2
done
You will find it helpful to have a "create test" script that does something like
brainfuck $testname.bf < $testname.0 > $testname.1 2> $testname.2
You run this only when you're totally confident that the interpreter works for that case.
You keep your test suite under source control.
It's convenient to embellish your test script so you can leave out files that are expected to be empty.
Any time anything changes, you re-run all the tests. You probably also re-run them all nightly via a cron job.
Finally, you want to add enough tests to get good test coverage of your compiler's source code. The quality of coverage tools varies widely, but GNU Gcov is an adequate coverage tool.
Good luck with your interpreter! If you want to see a lovingly crafted but not very well documented testing infrastructure, go look at the test2 directory for the Quick C-- compiler.
I don't think there's anything 'special' about testing a compiler; in a sense it's almost easier than testing some programs, since a compiler has such a basic high-level summary - you hand in source, it gives you back (possibly) compiled code and (possibly) a set of diagnostic messages.
Like any complex software entity, there will be many code paths, but since it's all very data-oriented (text in, text and bytes out) it's straightforward to author tests.
I’ve written an article on compiler testing, the original conclusion of which (slightly toned down for publication) was: It’s morally wrong to reinvent the wheel. Unless you already know all about the preexisting solutions and have a very good reason for ignoring them, you should start by looking at the tools that already exist. The easiest place to start is Gnu C Torture, but bear in mind that it’s based on Deja Gnu, which has, shall we say, issues. (It took me six attempts even to get the maintainer to allow a critical bug report about the Hello World example onto the mailing list.)
I’ll immodestly suggest that you look at the following as a starting place for tools to investigate:
Software: Practice and Experience April 2007. (Payware, not available to the general public---free preprint at http://pobox.com/~flash/Practical_Testing_of_C99.pdf.
http://en.wikipedia.org/wiki/Compiler_correctness#Testing (Largely written by me.)
Compiler testing bibliography (Please let me know of any updates I’ve missed.)
In the case of brainfuck, I think testing it should be done with brainfuck scripts. I would test the following, though:
1: Are all the cells initialized to 0
2: What happens when you decrement the data pointer when it's currently pointing to the first cell? Does it wrap? Does it point to invalid memory?
3: What happens when you increment the data pointer when it's pointing at the last cell? Does it wrap? Does it point to invalid memory
4: Does output function correctly
5: Does input function correctly
6: Does the [ ] stuff work correctly
7: What happens when you increment a byte more than 255 times, does it wrap to 0 properly, or is it incorrectly treated as an integer or other value.
More tests are possible too, but this is probably where i'd start. I wrote a BF compiler a few years ago, and that had a few extra tests. Particularly I tested the [ ] stuff heavily, by having a lot of code inside the block, since an early version of my code generator had issues there (on x86 using a jxx I had issues when the block produced more than 128 bytes or so of code, resulting in invalid x86 asm).
You can test with some already written apps.
The secret is to:
Separate the concerns
Observe the law of Demeter
Inject your dependencies
Well, software that is hard to test is a sign that the developer wrote it like it's 1985. Sorry to say that, but utilizing the three principles I presented here, even line numbered BASIC would be unit testable (it IS possible to inject dependencies into BASIC, because you can do "goto variable".