Rough estimate of test cases - testing

I'm curious how many test cases others have for a site similar to mine. It's your basic CRUD with business workflow website. 3 user roles, a couple input pages, a couple search pages, a business rule engine, etc. Maybe 50k lines of .NET code (workflow and persistence altogether). DB with about 10 main tables plus about 100 supporting tables (lookups, logs, etc.). The main UI for entering data is quite big, around 100 data fields, multiple grids, about 5 action/submit type buttons.
I know this is vague and I'm only hoping for order of magnitude figures. I'm also thinking of basic test cases, not code coverage type cases. But like if I told you we had 25 test cases I'm sure you'd say way WAY not enough. So I'm just looking for ballpark figures.
TIA

I would have as many test cases as it takes to ensure a high level of confidence in the system.
The number of tables, rules, lines of code, etc is actually immaterial.
You should have the appropriate unit tests to ensure your domain objects and business rules are firing correctly. You should have tests to ensure your queries execute appropriately (this is a harder one).
You might even want to have test cases for paths through the software. In other words, click here, get this page, click there, edit a field, save the page, go back... This type is the most difficult as the tests are usually recorded and have to be rerecorded when the pages change (ie: a field is added or removed).
Generally speaking it's more about coverage than number of tests. You want your tests to cover as much of the applications funcionality as is feasible. Note that I didn't say possible. You can cover an entire application (100%) with test cases, but for every little change, bug fix, etc you'll have to recode those tests. This is more desired for a mature app. For newer apps you don't want to hamstring your developers and QA team that way as they'll spend inordinate amounts of time fixing/changing unit tests...
For any system, you could easily spend as much time developing your automated tests as you do the system itself. In some cases, even more.
As for our group, we tend to have lots of unit tests. However, for testing paths through the system we only record those once a particular area has moved into a "maintenance" type of mode. Meaning we expect little change for quite a while in that area and the path test is simply to ensure no one jacked it up.
UPDATE: the comments here led me to the following:
Going a little further: Let's examine 1 small piece of code:
Int32 AddNumbers(Int32 a, Int32 b) {
return a+b;
}
On the face of it you could get away with a single test:
Int32 result = AddNumbers(1,2);
Assert.Equals(result, 3);
However, that probably isn't enough. What happens if you do this:
Int32 result = AddNumbers(Int32.MaxValue, 1);
Assert.Equals(result, (Int32.MaxValue+1));
Now we have a failure. Here's another one:
Int32 result = AddNumbers(Int32.MinValue, -1);
Assert.Equals(result, (Int32.MinValue-1));
So, we have an extremely simple method that requires at least 3 tests. The initial to see if it can give any result, then 2 for bounds checking. That's 3 tests for essentially 2 lines of code (method definition and the one line computation).
As your code becomes more complex, things get really dicey:
Decimal DivideThis(Decimal a, Decimal b) {
result = Decimal.Divide(a,b);
}
This slight change introduces yet another exception condition beyond bounds: DivideByZero. So now we are up to 4 tests required for 2 lines of code.
Now, let's simplify it a bit:
String AppendData(String data, String toAppend) {
return String.Format("{0}{1}", data, toAppend);
}
Our test case here is:
String result = AppendData("Hello", "World");
Assert.Equals(result, "HelloWorld");
That's just one test case for the code block, with no others really needed.
What does this tell us: For starters 2 lines of code might cause us to need between 1 and 4 test cases. You mentioned 50k lines... Using that logic, you will need between 50,000 and 200,000 test cases...
Of course, life is rarely so simple. In those 50k lines of code you have, there are going to be large blocks of code that have very limited inputs. For example a mortgage interest calculator might take 3 parameters, and return 1 value (the APR). The code itself might run 100 lines or so (been awhile, just work with me). The number of test cases for this is going to be determined by edge cases along the lines of making sure you properly handle rounding.
So, let's say it's 5 cases: which brings us to 20 lines of code = 1 case. Calculating that out your 50k lines might result in 2,500 test cases. Obviously much smaller than what we expected above.
Finally, I'm going to throw another wrinkle into the mix. Some test systems can handle inputs and your assertions coming from a data file. Considering our first one we could have a data file that has a line for each parameter combination we want to test. In this scenario, we only need 1 test case to cover 3 (or more..) possible conditions.
The test case might look like (pseudo code):
read input file.
parse expected result, parameter 1, parameter 2
run method
assert method result = parsed result
repeat for each line of the file
With that capability, we are down to 1 test case per scenario. I would say 1 per method, but the reality is that most methods are rarely standalone and it's entirely possible that numerous methods are implicitly tested through explicit testing of others; therefore not requiring their own individual tests.
This leads me to this: It is impossible to determine the right number of test cases without a full understanding of your code base. 5 cases that are at the UI level might be enough for complete coverage depending on the complexity of the tests; or it might take thousands. Therefore it's much better to base it on code coverage. What percentage of the code, and branching logic, are you testing?

If you ask a car salesman for a rough price of a car and he would give me that price, I wouldn't buy my car there, because he forgot to ask me some important questions. What kind of car do you want? Which extras do you want on the car? etc.
Same for number of test cases .... If a hiring manager would ask me that question I would probably give him the following answer.
#test cases = between #Requirements*2 and #Requirements*infinite (some requirements can lead to bollions of possibilities)
I also would say that based on my experience the number would realistically be #Requirements*5 (is the number I use at the initial phase, for projects with new, changed and omitted functionality)
where the following error margin has to be taken depending on the phase I am making this estimate:
Initiation phase : error margins = 400%
...
Testing phase : error margin = 10%
By the time you start the testing phase, detailed requirements/specs are available, volatillity of requirements is stabilized, creep of requirements is almost zero, etc.
At that time I also will be able to give better estimates ...

Related

What is the most conclusive way to evaluate an n-way split test where n > 2?

I have plenty of experience designing, running and evaluating two-way split tests (A/B Tests). Those are by far the most common in digital marketing, where I do most of my work.
However, I'm wondering if anything about the methodology needs to change when more variants are introduced into an experiment (creating, say, a 3-way test (A/B/C Test)).
My instinct tells me I should just run n-1 evaluations against the control group.
If I run a 3-way split test, for example, instinct says I should find significance and power twice:
Treatment A vs Control
Treatment B vs Control
So, in that case, I'm finding out which, if any, treatment performed better than the control (1-tailed test, alt: treatment - control > 0, the basic marketing hypothesis).
But, I'm doubting my instinct. It's occurred to me that running a third test contrasting Treatment A with Treatment B could yield confusing results.
For example, what if there's not enough evidence to reject a null that treatment B = treatment A?
That would lead to a goofy conclusion like this:
Treatment A = Control
Treatment B > Control
Treatment B = Treatment A
If treatments A and B are likely only different due to random chance, how could only one of them outperform the control?
And that's making me wonder if there's a more statistically sound way to evaluate split tests with more than one treatment variable. Is there?
Your instincts are correct, and you can feel less goofy by rewording your statements:
We could find no statistically significant difference between Treatment A and Control.
Treatment B is significantly better than Control.
However it remains inconclusive whether Treatment B is better than Treatment A.
This would be enough to declare Treatment B a winner, with the possible followup of retesting A vs B. But depending on your specific situation, you may have a business need to actually make sure Treatment B is better than Treatment A before moving forward and you can make no such decision with your data. You must gather more data and/or restart a new test.
What I've found is a far more common scenario is Treatment A and Treatment B both soundly beat control (as they're often closely related and have related hypotheses), but there is no statistically significant difference between Treatment A or Treatment B. This is an interesting scenario where if you are required to pick a winner, it's okay throwing significance out the window and picking the one that has the strongest effect. The reason why is that the significance level (eg. 95%) is set to avoid false positives and making unnecessary changes. There's an assumption that there are switching costs. In this case, you must pick A or B and throw out control, so in my opinion it's okay picking the best one until you have more data.

Is it worth introducing "incorrect" results to avoid crashing a program?

In my organisation, I see a lot of places where code has been put inside monitor blocks (RPG's version of try..except) to prevent raising exceptions on arithmetic errors. For instance:
Monitor;
Pxxhour = Bctime/60;
PxxMin = %Rem(Bctime:60);
On-Error;
Pxxhour = 0;
PxxMin = 0;
Pxxhour and Pxxmin are screen fields that will be displayed to users. So if there is an error in the operations, these get a value of 0. Though this prevents the program from crashing, how does it help? Users keep seeing the wrong values on the screen. Similarly, I see code which assigns the highest possible value for a given variable rather than allowing an overflow exception. Though this will prevent the program from blowing up, how does it help in the long run? Wouldn't calculations have wrong values and result in wrong business data?
The answers given below by #jmarkmurphy and #Charles successfully address the question from an RPG and IBM midrange perspective, which is what I was after.
There's two use cases for a MONITOR block...
Expected errors
Unexpected errors
For expected errors, replacing bad or invalid data with an accepted value is a valid solution in some cases. The trick is knowing which cases. The answer to that is something your business people would need to help decide. Depends of what the program is doing and what data has the problem.
For instance, given some sort of internal sales report, you might have something like so:
dcl-c DIVIDE_BY_ZERO const(00102);
dcl-c RESULT_TO_LARGE const(00103);
monitor;
averageSale = totalSalesAmount / numberSales;
on-error DIVIDE_BY_ZERO;
averageSale = 0;
on-error RESULT_TO_LARGE;
averageSale = *HIVAL;
endmon;
What's important about the above is that I'm expecting one of two possible errors and I've decided to handle them a certain way. The business people don't care that technically averageSale is undefined when numberSales is *ZERO. They'y just want a zero to appear on the report. They also understand that there's only so much room on the page and that if the number is all nines, the actual value might be bigger.
And unexpected error, such as a decimal data error, would not be caught be this MONITOR block.
For an unexpected caught by a monitor block via a ON-ERROR with *ALL or no error code specified, I'd expect to see some sort of logging of the issue followed by either skipping the problem record or cleaning shutting down depending on what the program is doing in the first place.
It appears that your code is expecting certain error(s), but without explicitly defining which error(s) codes it's willing to handle. This is lazy and not a good practice.
As far as your questions about rather or not the handling of those expected errors is valid...only you and your users can decide that
You might want to take a look at Chapter 7 - Exception and error handling of the IBM Redbook Who Knew You Could Do That with RPG IV? Modern RPG for the Modern Programmer
What Should I Do When I Have Errors in my Calculations
Programs that blow up on users are bad, even if it is the user's fault. It makes the user believe that the program is buggy, and then anything unexpected that happens becomes the program's fault; something to be fixed. Things can get really out of hand in this manner causing help desk calls for ordinary occurrences that just appear a little odd, even when the outcome is actually correct.
One option is to validate the user input to prevent calculation errors, but what do you do when you can't really prevent all of them. In our world, one of these situations is in invoicing. 5250 screens have limited real estate and you can't always make the fields big enough to hold all eventualities. So there are tradeoffs. Maybe you need to be able to sell thousands of some small items on a single invoice, but the largest total invoice you have ever had is $100K. So you size your fields like this:
dcl-s quantity Packed(5:0);
dcl-s unitPrice Packed(7:2);
dcl-s ammount Packed(9:2);
All are odd because they take up the same space on disk as the next lower even precision. You don't sell fractional quantities, and the maximum value in each field is:
quantity = 99,999;
unitPrice = $99,999.99;
amount = $9,999,999.99;
Now you can see that these maximums should easily handle all valid invoices, but it also leaves plenty of potential for calculation errors. If the user keys in maximum numbers for quantity and unitPrice, the resulting number would require a Packed(12:2) field. That would cause an overflow. In an invoice when the unit price is stored in the invoice detail, we can add an edit when the quantity and unit price is entered that checks for an extended amount overflow, and send an appropriate error message. But what if unit prices are not stored in the invoice detail, but instead in a pricing table. Now there is not a good way, if a price is changed for example to ensure that none of the existing invoices will be affected adversely.
So what do you do about a decimal overflow, or any other calculation error, be it a data problem, or something else? And what happens if the error occurs Blowing up the program is not a good option. Another option, the one that seems to be taken in the question is to apply some default value that the users will quickly recognize is out of the ordinary. It will appear in reports, and on screens. When the users see those excessively large, or small numbers, then they can know to go back and check the data.

Memory efficiency in If statements

I'm thinking more about how much system memory my programs will use nowadays. I'm currently doing A level Computing at college and I know that in most programs the difference will be negligible but I'm wondering if the following actually makes any difference, in any language.
Say I wanted to output "True" or "False" depending on whether a condition is true. Personally, I prefer to do something like this:
Dim result As String
If condition Then
Result = "True"
Else
Result = "False"
EndIf
Console.WriteLine(result)
However, I'm wondering if the following would consume less memory, etc.:
If condition Then
Console.WriteLine("True")
Else
Console.WriteLine("False")
EndIf
Obviously this is a very much simplified example and in most of my cases there is much more to be outputted, and I realise that in most commercial programs these kind of statements are rare, but hopefully you get the principle.
I'm focusing on VB.NET here because that is the language used for the course, but really I would be interested to know how this differs in different programming languages.
The main issue making if's fast or slow is predictability.
Modern CPU's (anything after 2000) use a mechanism called branch prediction.
Read the above link first, then read on below...
Which is faster?
The if statement constitutes a branch, because the CPU needs to decide whether to follow or skip the if part.
If it guesses the branch correctly the jump will execute in 0 or 1 cycle (1 nanosecond on a 1Ghz computer).
If it does not guess the branch correctly the jump will take 50 cycles (give or take) (1/200th of a microsecord).
Therefore to even feel these differences as a human, you'd need to execute the if statement many millions of times.
The two statements above are likely to execute in exactly the same amount of time, because:
assigning a value to a variable takes negligible time; on average less than a single cpu cycle on a multiscalar CPU*.
calling a function with a constant parameter requires the use of an invisible temporary variable; so in all likelihood code A compiles to almost the exact same object code as code B.
*) All current CPU's are multiscalar.
Which consumes less memory
As stated above, both versions need to put the boolean into a variable.
Version A uses an explicit one, declared by you; version B uses an implicit one declared by the compiler.
However version A is guaranteed to only have one call to the function WriteLine.
Whilst version B may (or may not) have two calls to the function WriteLine.
If the optimizer in the compiler is good, code B will be transformed into code A, if it's not it will remain with the redundant calls.
How bad is the waste
The call takes about 10 bytes for the assignment of the string (Unicode 2 bytes per char).
But so does the other version, so that's the same.
That leaves 5 bytes for a call. Plus maybe a few extra bytes to set up a stackframe.
So lets say due to your totally horrible coding you have now wasted 10 bytes.
Not much to worry about.
From a maintainability point of view
Computer code is written for humans, not machines.
So from that point of view code A is clearly superior.
Imagine not choosing between 2 options -true or false- but 20.
You only call the function once.
If you decide to change the WriteLine for another function you only have to change it in one place, not two or 20.
How to speed this up?
With 2 values it's pretty much impossible, but if you had 20 values you could use a lookup table.
Obviously that optimization is not worth it unless code gets executed many times.
If you need to know the precise amount of memory the instructions are going to take, you can use ildasm on your code, and see for yourself. However, the amount of memory consumed by your code is much less relevant today, when the memory is so cheap and abundant, and compilers are smart enough to see common patterns and reduce the amount of code that they generate.
A much greater concern is readability of your code: if a complex chain of conditions always leads to printing a conditionally set result, your first code block expresses this idea in a cleaner way than the second one does. Everything else being equal, you should prefer whatever form of code that you find the most readable, and let the compiler worry about optimization.
P.S. It goes without saying that Console.WriteLine(condition) would produce the same result, but that is of course not the point of your question.

Creating a testing strategy to check data consistency between two systems

With a quick search over stackoverflow was not able to find anything so here is my question.
I am trying to write down the testing strategy for a application where two applications sync with each other every day to keep a huge amount of data in sync.
As its a huge amount of data I don't really want to cross check everything. But just want to do a random check every time a data sync happens. What should be the strategy here for such system?
I am thinking of this 2 approach.
1) Get a count of all data and cross check both are same
2) Choose a random 5 data entry and verify that their proprty are in sync.
Any suggestion would be great.
What you need is known as Risk Management, in Software Testing it is called Software Risk Management.
It seems your question is not about "how to test" what you are about to test but how to describe what you do and why you do that (based on the question I assume you need this explanation for yourself too...).
Adding SRM to your Test Strategy should describe:
The risks of not fully testing all and every data in the mirrored system
A table scaling down SRM vs amount of data tested (ie probability of error if only n% of data tested versus -e.g.- 2n% tested), in other words saying -e.g.!- 5% of lost data/invalid data/data corrupption/etc if x% of data was tested with a k minute/hour execution time
Based on previous point, a break down of resources used for the different options (e.g. HW load% for n hours, manhours used is y, costs of HW/SW/HR use are z USD)
Probability -and cost- of errors/issues with automation code (ie data comparison goes wrong and results in false positive or false negative, giving an overhead to DBA, dev and/or testing)
What happens if SRM option taken (!!e.g.!! 10% of data tested giving 3% of data corruption/loss risk and 0.75% overhead risk -false positive/negative results-) results in actual failure, ie reference to Business Continuity and effects of data, integrity, etc loss
Everything else comes to your mind and you feel it applies to your *current issue* in your *current system* with your *actual preferences*.

Branch Coverage

When writing test cases which are supposed to have 100% branch coverage, is it ok to have one of your cases that covers two branches and another case that only covers one.
note: we are assuming there are only three branches in the code.
edit: 3 branches means three basic if statments that are all seperate to each other within a body of code. e.g.
input (x, y)
if (x<0)
something
if (x==y)
something
if (x > y)
something
output (x)
I have one test case that covers the first branch and one test case that covers the other two branches
A test is a question that we have about the product. The idea of one test per (written) branch might be useful or silly.
I have some questions about the example you give
input (x, y)
if (x<0)
something
if (x==y)
something
if (x > y)
something
output (x)
What happens if x is greater than zero? Are you supposed to fall through? What's supposed to happen if x is less than y? Something? Nothing?
Here's the thing: code (and branch) (and condition) coverage are nice ideas. But what does it mean to "cover" a line or a branch or a condition? Is it to make sure that the program can work--that is, execute a given line/branch/condition without crashing? Or is to to make sure that that program will work?
---Michael B.
Personally, I focus on the behaviour of the code. As such, there are three different possible ways for the code to be executed as such there should be three tests.
Think of it this way - if one of the two branches breaks, you might be unaware because the test still passes (as the other branch still works) but production code fails. Not ideal.
Yes it does take more time, but in some cases it is worth it... 100% for everything? Maybe not to that level of extremes.
100% branch coverage? Does the person asking for something like that have any real world test coverage experience? In my experience, for reasonably complex projects, obtaining 75-80% code coverage and around 60-70% branch coverage is the best one can hope for. These numbers are usually the raw, pre-analys, numbers. They go up (~92-95% code and 80-85% branch) after the snippets impossible to reach are eliminated, like Asserts, default switch cases, 'defense in depth' code paths and such.
As for your question: the less test cases you have the better. Don't forget that tests take time not only to develop, but to run and analyze failures too. After you waited your first time you wait 4 days for the whole test suite to finish, you quickly learn the value of reducing your number of test cases to the minimum that gives confidence in the coverage.