Data Flow Coverage - testing

If you write a program, it is usually possible to drive it such that all paths are covered. Hence, 100% coverage is easy to obtain (ignoring unfeasible code paths which modern compilers catch anyways).
However, 100% code coverage should imply that all variable definition-use coverage is also achieved, because variables are defined within the program and used within it. If all code is covered, all DU pairs should also be covered.
Why then, is it said that path coverage is easier to obtain, but data flow coverage is not usually possible to achieve 100% ? I do not understand why not? What can be an example of that?

It's easier to achieve 100% code coverage than all of the possible inputs because the set of all possible inputs can be extremely large or practically unlimited. It would take too much time to test them all.
Let's look at a simple example function:
double invert(double x) {
return 1.0/x;
}
A unit test would could look like this:
double y = invert(5);
double expected = 1.0/5.0;
EXPECT_EQ( expected, y );
This test achieves 100% code coverage. However, it's only 1 in 1.8446744e+19 possible inputs (assuming a double is 64 bits wide).
The idea behind All-pairs Testing is that it's not practical to test every possible input, so we have to identify the ranges that would cover all cases.
With my invert() function, there are at least two sets that matter: {non-zero values} and {zero}.
We need to add another test, which covers the same code path, but has a different outcome:
EXPECT_THROWS( invert(0.0) );
Furthermore, since the test writer has to design the different possible sets of parameters to achieve full data input coverage to a test, it could be impossible to know what the correct sets are.
Consider this function:
double multiply(double x, double y);
My instinct would be to write tests for small numbers and another for big numbers, to test overflow.
However, the developer may have written it poorly, in this way:
double multiply(double x, double y) {
if(x==0) return 0;
return 1.0 / ( (1.0/x) * (1.0/y) );
}
If our tests didn't use 0 for y, then we'd miss a bug. Knowledge of how the algorithms are designed is very important in understanding the proper inputs for a unit test, and that's why the programmers who write the code need to be involved in unit testing.

Related

X and Y inputs in LabVIEW

I am new to LabVIEW and I am trying to read a code written in LabVIEW. The block diagram is this:
This is the program to input x and y functions into the voltage input. It is meant to give an input voltage in different forms (sine, heartshape , etc.) into the fast-steering mirror or galvano mirror x and y axises.
x and y function controls are for inputting a formula for a function, and then we use "evaluation single value" function to input into a daq assistant.
I understand that { 2*(|-Mpi|)/N }*i + -Mpi*pi goes into the x value. However, I dont understand why we use this kind of formula. Why we need to assign a negative value and then do the absolute value of -M*pi. Also, I don`t understand why we need to divide to N and then multiply by i. And finally, why need to add -Mpi again? If you provide any hints about this I would really appreciate it.
This is just a complicated way to write the code/formula. Given what the code looks like (unnecessary wire bends, duplicate loop-input-tunnels, hidden wires, unnecessary coercion dots, failure to use appropriate built-in 'negate' function) not much care has been given in writing it. So while it probably yields the correct results you should not expect it to do so in the most readable way.
To answer you specific questions:
Why we need to assign a negative value and then do the absolute value
We don't. We can just move the negation immediately before the last addition or change that to a subtraction:
{ 2*(|Mpi|)/N }*i - Mpi*pi
And as #yair pointed out: We are not assigning a value here, we are basically flipping the sign of whatever value the user entered.
Why we need to divide to N and then multiply by i
This gives you a fraction between 0 and 1, no matter how many steps you do in your for-loop. Think of N as a sampling rate. I.e. your mirrors will always do the same movement, but a larger N just produces more steps in between.
Why need to add -Mpi again
I would strongly assume this is some kind of quick-and-dirty workaround for a bug that has not been fixed properly. Looking at the code it seems this +Mpi*pi has been added later on in the development process. And while I don't know what the expected values are I would believe that multiplying only one of the summands by Pi is probably wrong.

Factorial code coverage

In testing a recursive function such as the following factorial method; is it necessary to test the default case? If I pass in 3 the output is 7 and code coverage reports show 100%. However, I didn't explicitly test factorial(0). What are your thoughts on this?
public class Factorial
{
public static double factorial(int x)
{
if (x == 0)
{
return 1.0;
}
return x + factorial(x - 1);
}
}
Code coverage doesn't tell you everything. In this case you'll get 100% line coverage for factorial(3) but it won't cover all "cases".
When testing a recursive function you'd want to test various cases:
Each of the base cases.
The recursive cases.
Incorrect input (e.g., negative numbers).
Any function-specific edge cases.
You can test less but you'll leave yourself open for potential bugs when the code is changed in the future.
Technically if you test[well, execute] factorial(2) you also test [execute] factorial(1) if your algorithm is correct.
How would your testing tool know that your code was correct? How would a tester that didn't write the code know your code was correct? The point of "test coverage" is to determine that in fact as much of the code has been tested [well, executed] regardless of whether it is algorithmically correct or not.
A line not executed is a line for which you have no evidence it works. This is what test coverage tells your. The purpose of your tests is to check that the application computes the correct answer. Coverage and tests serve two different purposes. Coverage just happens to take advantage of the fact that tests exercise code.

Fuzzy screenshot comparison with Selenium

I'm using Selenium to automate webpage functional testing. It's important for us to do a pixel-by-pixel comparison when we roll out new code, so we're using Selenium to take screenshots and comparing the base64 encoded strings to see if anything has changed.
We're finding that in practice, it's hard to get complete pixel consistency, especially with images. I would like minor blurriness / rendering artifacts to count as a "pass" instead of a "fail", so I'm wondering if there's a way of doing a fuzzy comparison to make our tests a bit less fragile.
I was thinking of maybe looking at the Levenshtein distance between the base64 strings as a starting point, but I don't really know if that's a good approach, or what the tolerances should be that distinguish "something moved on the page" from "rendering artifact". Any ideas / approaches?
So I ended up going with the ImageMagick command-line tool (because why re-invent image comparison). The "Peak Absolute Error" metric of the "compare" tool tells you how much you have to fuzz pixels before two images are identical. This seems to work well... for an image with slight graphical distortions, there might be a lot of pixels that don't match, but slight fuzzing is enough to make them match; but for two images that are actually different, even though most pixels might match, the ones that don't tend to be very different. Right now I'm checking for a PAE of less than 15% to see if the images should be counted as identical. Command line I'm using is:
compare -metric PAE original.png new.png comparison.png
Documentation on ImageMagick's compare tool is here: http://www.imagemagick.org/script/compare.php
I've been using perceptualdiff which uses a model of the human visual system to try to avoid reporting unnoticeable changes (the authors used for renderer regression testing). Usage is quite simple:
perceptualdiff -output diff.ppm baseline.png test.png
(where diff.ppm is a PPM format image highlighting the areas of difference)
The needle regression testing framework has support for using pdiff to compare screenshots:
http://needle.readthedocs.org/en/latest/#engines
Use an image format that does not create artifacts (like BMP or PNG) then you can do a per-pixel comparison.
I think you can check each pixel with a common Euclidean Distance.
To improve performance a little, do not calculate the square root but check the squares of the distances
// Maximum color distance allowed to define pixel consistency.
const float maxDistanceAllowed = 5.0;
// Square of the distance, used in calculations.
float maxD = maxDistanceAllowed * maxDistanceAllowed;
public bool isPixelConsistent(Color pixel1, Color pixel2)
{
// Euclidean distance in 3-dimensions.
float distanceSquared = (pixel1.R - pixel2.R)*(pixel1.R - pixel2.R) + (pixel1.G - pixel2.G)*(pixel1.G - pixel2.G) + (pixel1.B - pixel2.B)*(pixel1.B - pixel2.B);
// If the actual distance is less than the max allowed, the pixel is
// consistent and the method returns TRUE
return distanceSquared <= maxD;
}
Didn't test the C# code, but it should give you the idea. Give some tries and adjust the maxDistanceAllowed to your needs.
If anyone else is looking for something similar there is Depicted-dpxdt. It is designed to be used as part of a CI/CD process.
It combines perceptual diff with server, commandline tool, wrapper for phantom js.
It has functionality demonstrated like crawling entire site and comparing pages for differences.

What is the difference between an IF, CASE, and WHILE statement

I just want to know what the difference between all the conditional statements in objective-c and which one is faster and lighter.
One piece of advice: stop worrying about which language constructs are microscopically faster or slower than which others, and instead focus on which ones let you express yourself best.
If and case statements described
While statement described
Since these statements do different things, it is unproductive to debate which is faster.
It's like asking whether a hammer is faster than a screwdriver.
The language-agnostic version (mostly, obviously this doesn't count for declarative languages or other weird ones):
When I was taught programming (quite a while ago, I'll freely admit), a language consisted of three ways of executing instructions:
sequence (doing things in order).
selection (doing one of many things).
iteration (doing something zero or more times).
The if and case statements are both variants on selection. If is used to select one of two different options based on a condition (using pseudo-code):
if condition:
do option 1
else:
do option 2
keeping in mind that the else may not be needed in which case it's effectively else do nothing. Also remember that option 1 or 2 may also consist of any of the statement types, including more if statements (called nesting).
Case is slightly different - it's generally meant for more than two choices like when you want to do different things based on a character:
select ch:
case 'a','e','i','o','u':
print "is a vowel"
case 'y':
print "never quite sure"
default:
print "is a consonant"
Note that you can use case for two options (or even one) but it's a bit like killing a fly with a thermonuclear warhead.
While is not a selection variant but an iteration one. It belongs with the likes of for, repeat, until and a host of other possibilities.
As to which is fastest, it doesn't matter in the vast majority of cases. The compiler writers know far more than we mortal folk how to get the last bit of performance out of their code. You either trust them to do their job right or you hand-code it in assembly yourself (I'd prefer the former).
You'll get far more performance by concentrating on the macro view rather than the minor things. That includes selection of appropriate algorithms, profiling, and targeting of hot spots. It does little good to find something that take five minutes each month and get that running in two minutes. Better to get a smaller improvement in something happening every minute.
The language constructs like if, while, case and so on will already be as fast as they can be since they're used heavily and are relative simple. You should be first writing your code for readability and only worrying about performance when it becomes an issue (see YAGNI).
Even if you found that using if/goto combinations instead of case allowed you to run a bit faster, the resulting morass of source code would be harder to maintain down the track.
while isn't a conditional it is a loop. The difference being that the body of a while-loop can be executed many times, the body of a conditional will only be executed once or not at all.
The difference between if and switch is that if accepts an arbitrary expression as the condition and switch just takes values to compare against. Basically if you have a construct like if(x==0) {} else if(x==1) {} else if(x==2) ..., it can be written much more concisely (and effectively) by using switch.
A case statement could be written as
if (a)
{
// Do something
}
else if (b)
{
// Do something else
}
But the case is much more efficient, since it only evaluates the conditional once and then branches.
while is only useful if you want a condition to be evaluated, and the associated code block executed, multiple times. If you expect a condition to only occur once, then it's equivalent to if. A more apt comparison is that while is a more generalized for.
Each condition statement serves a different purpose and you won't use the same one in every situation. Learn which ones are appropriate for which situation and then write your code. If you profile your code and find there's a bottleneck, then you go ahead and address it. Don't worry about optimizing before there's actually a problem.
Are you asking whether an if structure will execute faster than a switch statement inside of a large loop? If so, I put together a quick test, this code was put into the viewDidLoad method of a new view based project I just created in the latest Xcode and iPhone SDK:
NSLog(#"Begin loop");
NSDate *loopBegin = [NSDate date];
int ctr0, ctr1, ctr2, ctr3, moddedNumber;
ctr0 = 0;
ctr1 = 0;
ctr2 = 0;
ctr3 = 0;
for (int i = 0; i < 10000000; i++) {
moddedNumber = i % 4;
// 3.34, 1.23s in simulator
if (moddedNumber == 0)
{
ctr0++;
}
else if (moddedNumber == 1)
{
ctr1++;
}
else if (moddedNumber == 2)
{
ctr2++;
}
else if (moddedNumber == 3)
{
ctr3++;
}
// 4.11, 1.34s on iPod Touch
/*switch (moddedNumber)
{
case 0:
ctr0++;
break;
case 1:
ctr1++;
break;
case 2:
ctr2++;
break;
case 3:
ctr3++;
break;
}*/
}
NSTimeInterval elapsed = [[NSDate date] timeIntervalSinceDate:loopBegin];
NSLog(#"End loop: %f seconds", elapsed );
This code sample is by no means complete, because as pointed out earlier if you have a situation that comes up more times than the others, you would of course want to put that one up front to reduce the total number of comparisons. It does show that the if structure would execute a bit faster in a situation where the decisions are more or less equally divided among the branches.
Also, keep in mind that the results of this little test varied widely in performance between running it on a device vs. running it in the emulator. The times cited in the code comments are running on an actual device. (The first time shown is the time to run the loop the first time the code was run, and the second number was the time when running the same code again without rebuilding.)
There are conditional statements and conditional loops. (If Wikipedia is to be trusted, then simply referring to "a conditional" in programming doesn't cover conditional loops. But this is a minor terminology issue.)
Shmoopty said "Since these statements do different things, it is nonsensical to debate which is faster."
Well... it may be time poorly spent, but it's not nonsensical. For instance, let's say you have an if statement:
if (cond) {
code
}
You can transform that into a loop that executes at most one time:
while (cond) {
code
break;
}
The latter will be slower in pretty much any language (or the same speed, because the optimizer turned it back into the original if behind the scenes!) Still, there are occasions in computer programming where (due to bizarre circumstances) the convoluted thing runs faster
But those incidents are few and far between. The focus should be on your code--what makes it clearest, and what captures your intent.
loops and branches are hard to explain briefly, to get the best code out of a construct in any c-style language depends on the processor used and the local context of the code. The main objective is to reduce the breaking of the execution pipeline -- primarily by reducing branch mispredictions.
I suggest you go here for all your optimization needs. The manuals are written for the c-style programmer and relatively easy to understand if you know some assembly. These manuals should explain to you the subtleties in modern processors, the strategies used by top compilers, and the best way to structure code to get the most out of it.
I just remembered the most important thing about conditionals and branching code. Order your code as follows
if(x==1); //80% of the time
else if(x==2); // 10% of the time
else if(x==3); //6% of the time
else break;
You must use an else sequence... and in this case the prediction logic in your CPU will predict correctly for x==1 and avoid the breaking of your pipeline for 80% of all execution.
More information from intel. Particularly:
In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common. Loops do not necessarily require any special ordering of code for static branch prediction, as only the condition of the loop iterator is normally used.
By following this rule you are flat-out giving the CPU hints about how to bias its prediction logic towards your chained conditionals.

Process to pass from problem to code. How did you learn?

I'm teaching/helping a student to program.
I remember the following process always helped me when I started; It looks pretty intuitive and I wonder if someone else have had a similar approach.
Read the problem and understand it ( of course ) .
Identify possible "functions" and variables.
Write how would I do it step by step ( algorithm )
Translate it into code, if there is something you cannot do, create a function that does it for you and keep moving.
With the time and practice I seem to have forgotten how hard it was to pass from problem description to a coding solution, but, by applying this method I managed to learn how to program.
So for a project description like:
A system has to calculate the price of an Item based on the following rules ( a description of the rules... client, discounts, availability etc.. etc.etc. )
I first step is to understand what the problem is.
Then identify the item, the rules the variables etc.
pseudo code something like:
function getPrice( itemPrice, quantity , clientAge, hourOfDay ) : int
if( hourOfDay > 18 ) then
discount = 5%
if( quantity > 10 ) then
discount = 5%
if( clientAge > 60 or < 18 ) then
discount = 5%
return item_price - discounts...
end
And then pass it to the programming language..
public class Problem1{
public int getPrice( int itemPrice, int quantity,hourOdDay ) {
int discount = 0;
if( hourOfDay > 10 ) {
// uh uh.. U don't know how to calculate percentage...
// create a function and move on.
discount += percentOf( 5, itemPriece );
.
.
.
you get the idea..
}
}
public int percentOf( int percent, int i ) {
// ....
}
}
Did you went on a similar approach?.. Did some one teach you a similar approach or did you discovered your self ( as I did :( )
I go via the test-driven approach.
1. I write down (on paper or plain text editor) a list of tests or specification that would satisfy the needs of the problem.
- simple calculations (no discounts and concessions) with:
- single item
- two items
- maximum number of items that doesn't have a discount
- calculate for discounts based on number of items
- buying 10 items gives you a 5% discount
- buying 15 items gives you a 7% discount
- etc.
- calculate based on hourly rates
- calculate morning rates
- calculate afternoon rates
- calculate evening rates
- calculate midnight rates
- calculate based on buyer's age
- children
- adults
- seniors
- calculate based on combinations
- buying 10 items in the afternoon
2. Look for the items that I think would be the easiest to implement and write a test for it. E.g single items looks easy
The sample using Nunit and C#.
[Test] public void SingleItems()
{
Assert.AreEqual(5, GetPrice(5, 1));
}
Implement that using:
public decimal GetPrice(decimal amount, int quantity)
{
return amount * quantity; // easy!
}
Then move on to the two items.
[Test]
public void TwoItemsItems()
{
Assert.AreEqual(10, GetPrice(5, 2));
}
The implementation still passes the test so move on to the next test.
3. Be always on the lookout for duplication and remove it. You are done when all the tests pass and you can no longer think of any test.
This doesn't guarantee that you will create the most efficient algorithm, but as long as you know what to test for and it all passes, it will guarantee that you are getting the right answers.
the old-school OO way:
write down a description of the problem and its solution
circle the nouns, these are candidate objects
draw boxes around the verbs, these are candidate messages
group the verbs with the nouns that would 'do' the action; list any other nouns that would be required to help
see if you can restate the solution using the form noun.verb(other nouns)
code it
[this method preceeds CRC cards, but its been so long (over 20 years) that I don't remember where i learned it]
when learning programming I don't think TDD is helpful. TDD is good later on when you have some concept of what programming is about, but for starters, having an environment where you write code and see the results in the quickest possible turn around time is the most important thing.
I'd go from problem statement to code instantly. Hack it around. Help the student see different ways of composing software / structuring algorithms. Teach the student to change their minds and rework the code. Try and teach a little bit about code aesthetics.
Once they can hack around code.... then introduce the idea of formal restructuring in terms of refactoring. Then introduce the idea of TDD as a way to make the process a bit more robust. But only once they are feeling comfortable in manipulating code to do what they want. Being able to specify tests is then somewhat easier at that stage. The reason is that TDD is about Design. When learning you don't really care so much about design but about what you can do, what toys do you have to play with, how do they work, how do you combine them together. Once you have a sense of that, then you want to think about design and thats when TDD really kicks in.
From there I'd start introducing micro patterns leading into design patterns
I did something similar.
Figure out the rules/logic.
Figure out the math.
Then try and code it.
After doing that for a couple of months it just gets internalized. You don't realize your doing it until you come up against a complex problem that requires you to break it down.
I start at the top and work my way down. Basically, I'll start by writing a high level procedure, sketch out the details inside of it, and then start filling in the details.
Say I had this problem (yoinked from project euler)
The sum of the squares of the first
ten natural numbers is, 1^2 + 2^2 +
... + 10^2 = 385
The square of the sum of the first ten
natural numbers is, (1 + 2 + ... +
10)^2 = 55^2 = 3025
Hence the difference between the sum
of the squares of the first ten
natural numbers and the square of the
sum is 3025 385 = 2640.
Find the difference between the sum of
the squares of the first one hundred
natural numbers and the square of the
sum.
So I start like this:
(display (- (sum-of-squares (list-to 10))
(square-of-sums (list-to 10))))
Now, in Scheme, there is no sum-of-squares, square-of-sums or list-to functions. So the next step would be to build each of those. In building each of those functions, I may find I need to abstract out more. I try to keep things simple so that each function only really does one thing. When I build some piece of functionality that is testable, I write a unit test for it. When I start noticing a logical grouping for some data, and the functions that act on them, I may push it into an object.
I've enjoyed TDD every since it was introduced to me. Helps me plan out my code, and it just puts me at ease having all my tests return with "success" every time I modify my code, letting me know I'm going home on time today!
Wishful thinking is probably the most important tool to solve complex problems. When in doubt, assume that a function exists to solve your problem (create a stub, at first). You'll come back to it later to expand it.
A good book for beginners looking for a process: Test Driven Development: By Example
My dad had a bunch of flow chart stencils that he used to make me use when he was first teaching me about programming. to this day I draw squares and diamonds to build out a logical process of how to analyze a problem.
I think there are about a dozen different heuristics I know of when it comes to programming and so I tend to go through the list at times with what I'm trying to do. At the start, it is important to know what is the desired end result and then try to work backwards to find it.
I remember an Algorithms class covering some of these ways like:
Reduce it to a known problem or trivial problem
Divide and conquer (MergeSort being a classic example here)
Use Data Structures that have the right functions (HeapSort being an example here)
Recursion (Knowing trivial solutions and being able to reduce to those)
Dynamic programming
Organizing a solution as well as testing it for odd situations, e.g. if someone thinks L should be a number, are what I'd usually use to test out the idea in pseudo code before writing it up.
Design patterns can be a handy set of tools to use for specific cases like where an Adapter is needed or organizing things into a state or strategy solution.
Yes.. well TDD did't existed ( or was not that popular ) when I began. Would be TDD the way to go to pass from problem description to code?... Is not that a little bit advanced? I mean, when a "future" developer hardly understand what a programming language is, wouldn't it be counterproductive?
What about hamcrest the make the transition from algorithm to code.
I think there's a better way to state your problem.
Instead of defining it as 'a system,' define what is expected in terms of user inputs and outputs.
"On a window, a user should select an item from a list, and a box should show him how much it costs."
Then, you can give him some of the factors determining the costs, including sample items and what their costs should end up being.
(this is also very much a TDD-like idea)
Keep in mind, if you get 5% off then another 5% off, you don't get 10% off. Rather, you pay 95% of 95%, which is 90.25%, or 9.75% off. So, you shouldn't add the percentage.