Factorial code coverage - testing

In testing a recursive function such as the following factorial method; is it necessary to test the default case? If I pass in 3 the output is 7 and code coverage reports show 100%. However, I didn't explicitly test factorial(0). What are your thoughts on this?
public class Factorial
{
public static double factorial(int x)
{
if (x == 0)
{
return 1.0;
}
return x + factorial(x - 1);
}
}

Code coverage doesn't tell you everything. In this case you'll get 100% line coverage for factorial(3) but it won't cover all "cases".
When testing a recursive function you'd want to test various cases:
Each of the base cases.
The recursive cases.
Incorrect input (e.g., negative numbers).
Any function-specific edge cases.
You can test less but you'll leave yourself open for potential bugs when the code is changed in the future.

Technically if you test[well, execute] factorial(2) you also test [execute] factorial(1) if your algorithm is correct.
How would your testing tool know that your code was correct? How would a tester that didn't write the code know your code was correct? The point of "test coverage" is to determine that in fact as much of the code has been tested [well, executed] regardless of whether it is algorithmically correct or not.
A line not executed is a line for which you have no evidence it works. This is what test coverage tells your. The purpose of your tests is to check that the application computes the correct answer. Coverage and tests serve two different purposes. Coverage just happens to take advantage of the fact that tests exercise code.

Related

gcov/lcov + googletest create an artificially low branch coverage report

First, I am well aware of the "hidden branch" problem caused by throws/exceptions. This is not that.
What I am observing is:
My test framework (googletest) has testing macros (EXPECT_TRUE for example).
I write passing tests using the macros
Measuring branch coverage now asymptotes at 50% because I have not evaluated that test in both a passing and a failing condition...
Consider the following:
TEST (MyTests, ContrivedTest)
{
EXPECT_TRUE(function_that_always_returns_true());
}
Now assuming that I have every line and every branch perfectly covered in function_that_always_returns_true(), this branch coverage report will asymptote at 50% (because gcov does not observe line 3 evaluating in a failing condition, intentionally)
The only idea that I've had around this issue is that I could exclude the evaluation macros with something like LCOV_EXCL_BR_LINE, but this feels both un-ergonomic and hacky.
TEST (MyTests, ContrivedTest)
{
bool my_value = function_that_always_returns_true();
EXPECT_TRUE(my_value); //LCOV_EXCL_BR_LINE
}
This cannot be a niche problem, and I have to believe that people successfully use googletest with lcov/gcov. What do people do to get around this limitation?
After looking for far too long, I realized that all the testing calls I want to filter out are of the pattern EXPECT_*. So simply adding:
lcov_excl_br_line=LCOV_EXCL_BR_LINE|EXPECT_*
to my lcovrc solved my problem

Where does the KeY verification tool shine?

What are some code examples demonstrating KeY’s strength?
Details
With so many Formal Method tools available, I was wondering where KeY is better than its competition, and how? Some readable code examples would be quite helpful for comparison and understanding.
Updates
Searching through the KeY website, I found code examples from the book — is there a suitable code example in there somewhere?
Furthermore, I found a paper about the bug that KeY found in Java 8’s mergeCollapse in TimSort. What is a minimal code from TimSort that demonstrates KeY’s strength? I do not understand, however, why model checking supposedly cannot find the bug — a bit array with 64 elements should not be too large to handle. Are other deductive verification tools just as capable of finding the bug?
Is there an established verification competition with suitable code examples?
This is a very hard question, which is why it hasn't yet been answered after having already been asked more than one year ago (and although we from the KeY community are well aware of it...).
The Power of Interaction
First, I'd like to point out that KeY is basically the only tool out there allowing for interactive proofs of Java programs. Although many proofs work automatically and we have quite powerful automatic strategies at hand, sometimes interaction is required to understand why a proof fails (too weak or even wrong specifications, wrong code or "just" a prover incapacity) and to add suitable corrections or strengthenings.
Feedback from Proof Inspection
Especially in the case of a prover incapacity (specification and program are OK, but the problem is too hard for the prover to succeed automatically), interaction is a powerful feature. Many program provers (like OpenJML, Dafny, Frama-C etc.) rely on SMT solvers in the backend which they feed with many more or less small verification conditions. The verification status for these conditions is then reported back to the user, basically as pass or fail -- or timeout. When an assertion failed, a user can change the program or refine the specifications, but cannot inspect the state of the proof to deduct information about why something went wrong; this style is sometimes called "auto-active" as opposed to interactive. While this can be quite convenient in many cases (especially when proofs pass, since the SMT solvers can be really quick in proving something), it can be hard to mine SMT solver output for information. Not even the SMT solvers themselves know why something went wrong (although they can produce a counterexample), as they just are fed a set of formulas for which they attempt to find a contradiction.
TimSort: A Complicated Algorithmic Problem
For the TimSort proof which you mentioned, we had to use a lot of interaction to make them pass. Take, for instance, the mergeHi method of the sorting algorithm which has been proven by one of the most experienced KeY power users known to me. In this proof of 460K proof nodes, 3K user interactions were necessary, consisting of quite a lot of simple ones like the hiding of distracting formulas, but also of 478 quantifier instantiations and about 300 cuts (on-line lemma introduction). The code of that method features many difficult Java features like nested loops with labeled breaks, integer overflows, bit arithmetic and so on; especially, there are a lot of potential exceptions and other reasons for branching in the proof tree (which is why in addition, also five manual state merging rule applications have been used in the proof). The workflow for proving this method basically was to give the strategies a try for some time, check the proof state afterward, prune back the proof and introduce a helpful lemma to reduce the overall proof work and to start again; occasionally, quantifiers were instantiated manually if the strategies failed to find the right instantiation directly by themselves, and proof tree branches were merged to tackle state explosion. I would just claim here that proving this code is (at least currently) not possible with auto-active tools, where you cannot guide the prover in that way, and also cannot obtain the right feedback for knowing how to guide it.
Strength of KeY
Concluding, I'd say that KeY's strong in proving hard algorithmic problems (like sorting etc.) where you have complicated quantified invariants and integer arithmetic with overflows, and where you need to find quantifier instantiations and small lemmas on the fly by inspecting and interacting with the proof state. The KeY approach of semi-interactive verification also excels in general for cases where SMT solvers time out, such that a user cannot tell whether something is wrong or an additional lemma is required.
KeY can of course also proof "simple" problems, however there you need to take care that your program does not contain an unsupported Java feature like floating point numbers or multithreading; also, library methods can be quite a problem if they're not yet specified in JML (but this problem applies to other approaches as well).
Ongoing Developments
As a side remark, I also would like to point out that KeY is now more and more being transformed to a platform for static analysis of different kinds of program properties (not only functional correctness of Java programs). On the one hand, we have developed tools such as the Symbolic Execution Debugger which can be used also by non-experts to examine the behavior of a sequential Java program. On the other hand, we are currently busy in refactoring the architecture of the system for making it possible to add frontends for languages different than Java (in our internal project "KeY-RED"); furthermore, there are ongoing efforts to modernize the Java frontend such that also newer language features like Lambdas and so on are supported. We are also looking into relational properties like compiler correctness. And while we already support the integration of third-party SMT solvers, our integrated logic core will still be there to support understanding proof situations and manual interactions for cases where SMT and automation fails.
TimSort Code Example
Since you asked for a code example... I cannot right know think of "the" code example showing KeY's strength, but maybe for giving you a flavor of the complexity of mergeHi in the TimSort algorithm, here a shortened excerpt with some comments (the full method has about 100 lines of code):
private void mergeHi(int base1, int len1, int base2, int len2) {
// ...
T[] tmp = ensureCapacity(len2); // Method call by contract
System.arraycopy(a, base2, tmp, 0, len2); // Manually specified library method
// ...
a[dest--] = a[cursor1--]; // potential overflow, NullPointerException, ArrayIndexOutOfBoundsException
if (--len1 == 0) {
System.arraycopy(tmp, 0, a, dest - (len2 - 1), len2);
return; // Proof branching
}
if (len2 == 1) {
// ...
return; // Proof branching
}
// ...
outer: // Loop labels...
while (true) {
// ...
do { // Nested loop
if (c.compare(tmp[cursor2], a[cursor1]) < 0) {
// ...
if (--len1 == 0)
break outer; // Labeled break
} else {
// ...
if (--len2 == 1)
break outer; // Labeled break
}
} while ((count1 | count2) < minGallop); // Bit arithmetic
do { // 2nd nested loop
// That's one complex statement below...
count1 = len1 - gallopRight(tmp[cursor2], a, base1, len1, len1 - 1, c);
if (count1 != 0) {
// ...
if (len1 == 0)
break outer;
}
// ...
if (--len2 == 1)
break outer;
count2 = len2 - gallopLeft(a[cursor1], tmp, 0, len2, len2 - 1, c);
if (count2 != 0) {
// ...
if (len2 <= 1)
break outer;
}
a[dest--] = a[cursor1--];
if (--len1 == 0)
break outer;
// ...
} while (count1 >= MIN_GALLOP | count2 >= MIN_GALLOP);
// ...
} // End of "outer" loop
this.minGallop = minGallop < 1 ? 1 : minGallop; // Write back to field
if (len2 == 1) {
// ...
} else if (len2 == 0) {
throw new IllegalArgumentException(
"Comparison method violates its general contract!");
} else {
System.arraycopy(tmp, 0, a, dest - (len2 - 1), len2);
}
}
Verification Competition
VerifyThis is an established competition for logic-based verification tools which will have its 7th iteration in 2019. The concrete challenges for past events can be downloaded from the "archive" section of the website I linked. Two KeY teams participated there in 2017. The overall winner that year was Why3. An interesting observation is that there was one problem, Pair Insertion Sort, which came as a simplified and as an optimized Java version, for which no team succeeded in verifying the real-world optimized version on site. However, a KeY team finished that proof in the weeks after the event. I think that highlights my point: KeY proofs of difficult algorithmic problems take their time and require expertise, but they're likely to succeed due to the combined power of strategies and interaction.

Data Flow Coverage

If you write a program, it is usually possible to drive it such that all paths are covered. Hence, 100% coverage is easy to obtain (ignoring unfeasible code paths which modern compilers catch anyways).
However, 100% code coverage should imply that all variable definition-use coverage is also achieved, because variables are defined within the program and used within it. If all code is covered, all DU pairs should also be covered.
Why then, is it said that path coverage is easier to obtain, but data flow coverage is not usually possible to achieve 100% ? I do not understand why not? What can be an example of that?
It's easier to achieve 100% code coverage than all of the possible inputs because the set of all possible inputs can be extremely large or practically unlimited. It would take too much time to test them all.
Let's look at a simple example function:
double invert(double x) {
return 1.0/x;
}
A unit test would could look like this:
double y = invert(5);
double expected = 1.0/5.0;
EXPECT_EQ( expected, y );
This test achieves 100% code coverage. However, it's only 1 in 1.8446744e+19 possible inputs (assuming a double is 64 bits wide).
The idea behind All-pairs Testing is that it's not practical to test every possible input, so we have to identify the ranges that would cover all cases.
With my invert() function, there are at least two sets that matter: {non-zero values} and {zero}.
We need to add another test, which covers the same code path, but has a different outcome:
EXPECT_THROWS( invert(0.0) );
Furthermore, since the test writer has to design the different possible sets of parameters to achieve full data input coverage to a test, it could be impossible to know what the correct sets are.
Consider this function:
double multiply(double x, double y);
My instinct would be to write tests for small numbers and another for big numbers, to test overflow.
However, the developer may have written it poorly, in this way:
double multiply(double x, double y) {
if(x==0) return 0;
return 1.0 / ( (1.0/x) * (1.0/y) );
}
If our tests didn't use 0 for y, then we'd miss a bug. Knowledge of how the algorithms are designed is very important in understanding the proper inputs for a unit test, and that's why the programmers who write the code need to be involved in unit testing.

In any program doesn't 100% statement coverage imply 100 % branch coverage?

While solving MCQs for a practice test I came across this statement - "In any program 100% statement coverage implies 100 % branch coverage" and it is termed as incorrect. I think its a correct statement because if we cover all the statements then it means we also cover all the paths and hence all the branches. Could someone please shed more light on this one?
Consider this code:
...
if (SomeCondition) DoSomething();
...
If SomeCondition is always true, you can have 100% statement coverage (SomeCondition and DoSomething() will be covered), but you never exercise the case when the condition is false, when you skip DoSomething().
Below example, a = true will cover 100% of statements, but fails to test the branch where a division by zero fault is possible.
int fun(bool a){
int x = 0;
if (a) x =1;
return 100/x;
}
For a test set to achieve 100% branch coverage, every branching point in the code must have been taken in each direction, at least once.
The archetypical example, showing that 100% statement coverage does not imply 100% branch coverage, was already given by Alexey Frunze. It is a consequence of the fact that (at least in the majority of programming languages) it is possible to have branches that do not involve statements (such a branch basically skips the statements in the other branch).
The reason for wanting 100% branch coverage, rather than just 100% statement coverage, is that your tests must also show that skipping some statements works as expected.
My main reason for providing this answer is to point out that the converse, viz. "100% branch coverage implies 100% statement coverage" is correct.
Just because you cover every statement doesnt mean that you covered every branch the program could have taken.
you have to look at every possible branch, not just the statements inside every branch.

What is the difference between an IF, CASE, and WHILE statement

I just want to know what the difference between all the conditional statements in objective-c and which one is faster and lighter.
One piece of advice: stop worrying about which language constructs are microscopically faster or slower than which others, and instead focus on which ones let you express yourself best.
If and case statements described
While statement described
Since these statements do different things, it is unproductive to debate which is faster.
It's like asking whether a hammer is faster than a screwdriver.
The language-agnostic version (mostly, obviously this doesn't count for declarative languages or other weird ones):
When I was taught programming (quite a while ago, I'll freely admit), a language consisted of three ways of executing instructions:
sequence (doing things in order).
selection (doing one of many things).
iteration (doing something zero or more times).
The if and case statements are both variants on selection. If is used to select one of two different options based on a condition (using pseudo-code):
if condition:
do option 1
else:
do option 2
keeping in mind that the else may not be needed in which case it's effectively else do nothing. Also remember that option 1 or 2 may also consist of any of the statement types, including more if statements (called nesting).
Case is slightly different - it's generally meant for more than two choices like when you want to do different things based on a character:
select ch:
case 'a','e','i','o','u':
print "is a vowel"
case 'y':
print "never quite sure"
default:
print "is a consonant"
Note that you can use case for two options (or even one) but it's a bit like killing a fly with a thermonuclear warhead.
While is not a selection variant but an iteration one. It belongs with the likes of for, repeat, until and a host of other possibilities.
As to which is fastest, it doesn't matter in the vast majority of cases. The compiler writers know far more than we mortal folk how to get the last bit of performance out of their code. You either trust them to do their job right or you hand-code it in assembly yourself (I'd prefer the former).
You'll get far more performance by concentrating on the macro view rather than the minor things. That includes selection of appropriate algorithms, profiling, and targeting of hot spots. It does little good to find something that take five minutes each month and get that running in two minutes. Better to get a smaller improvement in something happening every minute.
The language constructs like if, while, case and so on will already be as fast as they can be since they're used heavily and are relative simple. You should be first writing your code for readability and only worrying about performance when it becomes an issue (see YAGNI).
Even if you found that using if/goto combinations instead of case allowed you to run a bit faster, the resulting morass of source code would be harder to maintain down the track.
while isn't a conditional it is a loop. The difference being that the body of a while-loop can be executed many times, the body of a conditional will only be executed once or not at all.
The difference between if and switch is that if accepts an arbitrary expression as the condition and switch just takes values to compare against. Basically if you have a construct like if(x==0) {} else if(x==1) {} else if(x==2) ..., it can be written much more concisely (and effectively) by using switch.
A case statement could be written as
if (a)
{
// Do something
}
else if (b)
{
// Do something else
}
But the case is much more efficient, since it only evaluates the conditional once and then branches.
while is only useful if you want a condition to be evaluated, and the associated code block executed, multiple times. If you expect a condition to only occur once, then it's equivalent to if. A more apt comparison is that while is a more generalized for.
Each condition statement serves a different purpose and you won't use the same one in every situation. Learn which ones are appropriate for which situation and then write your code. If you profile your code and find there's a bottleneck, then you go ahead and address it. Don't worry about optimizing before there's actually a problem.
Are you asking whether an if structure will execute faster than a switch statement inside of a large loop? If so, I put together a quick test, this code was put into the viewDidLoad method of a new view based project I just created in the latest Xcode and iPhone SDK:
NSLog(#"Begin loop");
NSDate *loopBegin = [NSDate date];
int ctr0, ctr1, ctr2, ctr3, moddedNumber;
ctr0 = 0;
ctr1 = 0;
ctr2 = 0;
ctr3 = 0;
for (int i = 0; i < 10000000; i++) {
moddedNumber = i % 4;
// 3.34, 1.23s in simulator
if (moddedNumber == 0)
{
ctr0++;
}
else if (moddedNumber == 1)
{
ctr1++;
}
else if (moddedNumber == 2)
{
ctr2++;
}
else if (moddedNumber == 3)
{
ctr3++;
}
// 4.11, 1.34s on iPod Touch
/*switch (moddedNumber)
{
case 0:
ctr0++;
break;
case 1:
ctr1++;
break;
case 2:
ctr2++;
break;
case 3:
ctr3++;
break;
}*/
}
NSTimeInterval elapsed = [[NSDate date] timeIntervalSinceDate:loopBegin];
NSLog(#"End loop: %f seconds", elapsed );
This code sample is by no means complete, because as pointed out earlier if you have a situation that comes up more times than the others, you would of course want to put that one up front to reduce the total number of comparisons. It does show that the if structure would execute a bit faster in a situation where the decisions are more or less equally divided among the branches.
Also, keep in mind that the results of this little test varied widely in performance between running it on a device vs. running it in the emulator. The times cited in the code comments are running on an actual device. (The first time shown is the time to run the loop the first time the code was run, and the second number was the time when running the same code again without rebuilding.)
There are conditional statements and conditional loops. (If Wikipedia is to be trusted, then simply referring to "a conditional" in programming doesn't cover conditional loops. But this is a minor terminology issue.)
Shmoopty said "Since these statements do different things, it is nonsensical to debate which is faster."
Well... it may be time poorly spent, but it's not nonsensical. For instance, let's say you have an if statement:
if (cond) {
code
}
You can transform that into a loop that executes at most one time:
while (cond) {
code
break;
}
The latter will be slower in pretty much any language (or the same speed, because the optimizer turned it back into the original if behind the scenes!) Still, there are occasions in computer programming where (due to bizarre circumstances) the convoluted thing runs faster
But those incidents are few and far between. The focus should be on your code--what makes it clearest, and what captures your intent.
loops and branches are hard to explain briefly, to get the best code out of a construct in any c-style language depends on the processor used and the local context of the code. The main objective is to reduce the breaking of the execution pipeline -- primarily by reducing branch mispredictions.
I suggest you go here for all your optimization needs. The manuals are written for the c-style programmer and relatively easy to understand if you know some assembly. These manuals should explain to you the subtleties in modern processors, the strategies used by top compilers, and the best way to structure code to get the most out of it.
I just remembered the most important thing about conditionals and branching code. Order your code as follows
if(x==1); //80% of the time
else if(x==2); // 10% of the time
else if(x==3); //6% of the time
else break;
You must use an else sequence... and in this case the prediction logic in your CPU will predict correctly for x==1 and avoid the breaking of your pipeline for 80% of all execution.
More information from intel. Particularly:
In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common. Loops do not necessarily require any special ordering of code for static branch prediction, as only the condition of the loop iterator is normally used.
By following this rule you are flat-out giving the CPU hints about how to bias its prediction logic towards your chained conditionals.