In behavior based testing, it looks like the number of error scenarios grow exponentially.
As per Aslak Hellesøy, BDD was created to combine automated acceptance tests, functional requirements and software documentation.
In 2003 I became part of a small clique of people from the XP community who were exploring better ways to do TDD. Dan North named this BDD. The idea was to combine automated acceptance tests, functional requirements and software documentation into one format that would be understandable by non-technical people as well as testing tools.
Software development teams use JBehave as a tool for BDD testing (thanks to Dan North).
As there can be a lot of possible negative options; it looks like the number of negative scenarios in a JBehave test suite can grow a lot in numbers. Time taken to run test suite as well as to modify the product increases with these kind of growing scenarios. Specially, I feel that it is becoming hard to maintain as a documentation of the product.
I am not exactly sure whether this is an abuse of BDD/JBehave concepts due to misunderstandings from different teams; or may be that is the way it should be.
Let me explain this concern with an example.
Say an application has a behavior to order an item via a REST service.
PUT /order
{
// JSON body with 3 mandatory parameters and 2 optional parameters
}
Happy scenario
Invoke REST endpoint with correct values for all 3 mandatory parameters
Invoke REST endpoint with correct values for all 5 parameters
Negative scenarios
There are a lot of negative scenarios that we can come up with.
Input value based scenarios
Mandatory parameter 1 is set to null, with correct values for other two mandatory parameters (3 possible scenarios with each mandatory parameter)
Mandatory parameter 1 is set to empty, with correct values for other two mandatory parameters (3 possible scenarios with each mandatory parameter)
Mandatory parameter 1 is set to a value in invalid format, with correct values for other two mandatory parameters (3 possible scenarios with each mandatory parameter)
Mandatory parameter 1 & 2 are set to null, with correct value for other mandatory parameter (2 possible scenarios)
Likewise, we can write 3^3 scenarios just for those three parameters; which grows exponentially with the number of parameters.
Then we can combine, optional parameters also into the equation and come up with more scenarios (say optional parameter with null, empty and invalid-format values).
Payment ability based scenarios
Based on available money, there will be scenarios.
Delivery location based scenarios
Based on delivery possibilities, there will be scenarios.
Question/Concern
I would like to learn more on whether all these negative scenarios (+more) should be part of JBehave based test suite? If that is the case, any advice/thoughts on how to make it more maintainable?
It helps a lot to know what the tested application does internally in its own validation processes, specifically the order of validation.
In a simplified example of three required parameters, you really only need three scenarios: one for each parameter. If you know that the application will fail if parameter one is invalid, you don't need to check that again when you test parameter two in another scenario since the second parameter would never be validated upon failure of the first, so instead of three times three, you simply have three:
1) invalid, valid, valid.
2) valid, invalid, valid.
3) valid, valid, invalid.
That is, unless, the application DOES check all three parameters and reports accordingly that one or more parameters were invalid. Speaking as a developer who now does automation, I can tell you that unless I thought multiple invalid parameters were a high probability, I would only check parameters one at a time and fail out with an error upon the first invalid parameter. Having written accounting software, there were times where it was logical to validate all parameters and report accordingly, but that was the exception rather than the rule. If you know what the application is checking, and in what order, you can write better test scripts, but I realize that is not always possible.
There is still the question of the seemingly limitless kinds of invalid data, so even in my simplified example, you could still have lots of tests, but in that situation it can be dealt-with using parameters of invalid values. You could still limit it to just three scenarios, each having any number of invalid parameters to test.
I hope I understood your question correctly and offered some useful information.
Related
Is it possible for a program cannot find the failure by using dynamic testing, but have fault? any simple example?
Please help! thanks.
Yes. Testing can only prove the absence of bugs for what you tested. Dynamic testing cannot cover all possible inputs and outputs in all environments with all dependencies.
First is to simply not test the code in question. This can be verified by checking the coverage of your test. Even if you achieve 100% coverage there can still be flaws.
Next is to not check all possible types and ranges of inputs. For example, if you have a function that scans for a word in a string, you need to check for...
The word at the start of the string.
The word at the end of the string.
The word in the middle of the string.
A string without the word.
The empty string.
These are known as boundary conditions and include things like:
0
Negative numbers
Empty strings
Null
Extremely large values
Decimals
Unicode
Empty files
Extremely large files
If the code in question keeps state, maybe in an object, maybe in global variables, you have to test that state does not become corrupted or interfere with subsequent runs.
If you're doing parallel processing you must test any number of possibilities for deadlocks or corruption resulting from trying to do the same thing at the same time. For example, two processes trying to write to the same file. Or two processes both waiting for a lock on the same resource. Do they lock only what they need? Do they give up their locks ASAP?
Once you test all the ways the code is supposed to work, you have to test all the ways that it can fail, whether it fails gracefully with an exception (instead of garbage), whether an error leaves it in a corrupted state, and so on. How does it handle resource failure, like failing to connect to a database? This becomes particularly important working with databases and files to ensure a failure doesn't leave things partially altered.
For example, if you're transferring money from one account to another you might write:
my $from_balance = get_balance($from);
my $to_balance = get_balance($to);
set_balance($from, $from_balance - $amount);
set_balance($to, $to_balance + $amount);
What happens if the program crashes after the first set_balance? What happens if another process changes either balance between get_balance and set_balance? These sorts of concurrency issues must be thought of and tested.
There's all the different environments the code could run in. Different operating systems. Different compilers. Different dependencies. Different databases. And all with different versions. All these have to be tested.
The test can simply be wrong. It can be a mistake in the test. It can be a mistake in the spec. Generally one tests the same code in different ways to avoid this problem.
The test can be right, the spec can be right, but the feature is wrong. It could be a bad design. It could be a bad idea. You can argue this isn't a "bug", but if the users don't like it, it needs to be fixed.
If your testing makes use of a lot of mocking, your mocks may not reflect how thing thing being mocked actually behaves.
And so on.
For all these flaws, dynamic testing remains the best we've got for testing more than a few dozen lines of code.
I read Bob Martin's brilliant article on how "Given-When-Then" can actual be compared to an FSM. It got me thinking. Is it OK for a BDD test to have multiple "When"s?
For eg.
GIVEN my system is in a defined state
WHEN an event A occurs
AND an event B occurs
AND an event C occurs
THEN my system should behave in this manner
I personally think these should be 3 different tests for good separation of intent. But other than that, are there any compelling reasons for or against this approach?
When multiple steps (WHEN) are needed before you do your actual assertion (THEN), I prefer to group them in the initial condition part (GIVEN) and keep only one in the WHEN section. This kind of shows that the event that really triggers the "action" of my SUT is this one, and that the previous one are more steps to get there.
Your test would become:
GIVEN my system is in a defined state
AND an event A occurs
AND an event B occurs
WHEN an event C occurs
THEN my system should behave in this manner
but this is more of a personal preference I guess.
If you truly need to test that a system behaves in a particular manner under those specific conditions, it's a perfectly acceptable way to write a test.
I found that the other limiting factor could be in an E2E testing scenario that you would like to reuse a statement multiple times. In my case the BDD framework of my choice(pytest_bdd) is implemented in a way that a given statement can have a singular return value and it maps the then input parameters automagically by the name of the function that was mapped to the given step. Now this design prevents reusability whereas in my case I wanted that. In short I needed to create objects and add them to a sequence object provided by another given statement. The way I worked around this limitation is by using a test fixture(which I named test_context), which was a python dictionary(a hashmap) and used when statements that don't have same singular requirement so the '(when)add object to sequence' step looked up the sequence in the context and appended the object in question to it. So now I could reuse the add object to sequence action multiple times.
This requirement was tricky because BDD aims to be descriptive. So I could have used a single given statement with the pickled memory map of the sequence object that I wanted to perform test action on. BUT would it have been useful? I think not. I needed to get the sequence constructed first and that needed reusable statements. And although this is not in the BDD bible I think in the end it is a practical and pragmatic solution to a very real E2E descriptive testing problem.
I would need a simple explanation on what a error-guessing test case it. Is it dangerous to use? I would appreciate an example.
Best regards,
Erica
Error guessing is documented here: http://en.wikipedia.org/wiki/Error_guessing
It's a name for something that's very common -- guessing where errors might occur based on your previous experience.
For example you have a routine that calculates whether a value inputted by a user from a terminal is a prime number:
You'd test the cases where errors tend to occur:
Empty input
Values that are not integers (floating point, letters, etc)
Values that are boundary cases 2, 3, 4
etc.
I would assume that every tester/QA person would be asked questions like this during an interview. It gives you a chance to talk about what procedures you've used in the past during testing.
I think the method goes like this:
1. Do formal testing
2. Use knowledge gained during formal testing about how the system works to make a list of places where defects might be
3. Design tests to verify whether those defects exist.
By its nature this process is very ad-hoc and unstructured.
I'm new to Selenium, and also fuzz testing. I see that Selenium IDE only allows the fixed test cases. But then fuzz testing seems to be helpful.
So what's behind a fuzz testing, what kind of tests does Selenium offer, is this a black box or white box testing.
Any help would be appreciated.
For a short answer:
Selenium is mostly about black-box testing, but you could do some whiter testing also with Selenium.
Selenium RC gives you much more freedom to do fuzz testing than Selenium IDE.
For a long answer, see below:
In this post I would try to explain the concept of randomly testing your web application using Selenium RC.
Normally speaking, a black-box testing technique like Selenium gives you a good freedom to
(1) Enter any value to a certain field
(2) Choose any field to test in a certain HTML form
(3) Choose any execution order/step to test a certain set of fields.
Basically you
use (1) to test a specific field in your HTML form (did you choose a good maximum length for a field), your JavaScript handling of that field's value (e.g. turning "t" into today's date, turning "+1" into tomorrow's date), and your back end Database's handling of that variable (VARCHAR length, conversion of numerical string into numerical value, ...).
use (2) to test ALL possible fields
use (3) to test the interaction of the fields with each other: is there a JavaScript alert popped up if the username field was not entered before the password field, is there a database (e.g. Oracle) trigger "popped up" when certain condition is not met.
Note that testing EVERYTHING (all states of your program, constructed by possible combinations of all variables) is not possible even in theory (e.g.: consider testing your small function used to parse a string, then how many possible values does a string have ?). Therefore, in reality, given a limited resource (time, money, people) you want to test only the "most crucial" execution paths of your web application. A path is called more "crucial" if it has more of the properties: (a) is executed frequently, (b) a deviation from specification causes serious loss.
Unfortunately, it is hard to know which execution cases are crucial, unless you have recorded all use cases of your application and select the most frequent ones, which is a very time consuming process. Furthermore even some bugs at the least executed use case could cause a lot of trouble if it is a security hole (e.g. someone steals all customers' password given a tiny bug in an URL handling of some PHP page).
That is why you need to randomly scan the testing space (i.e. the space of values used in those use cases), with the hope to run-something-and-scan-everything. This is called fuzz testing.
Using Selenium RC you could easily do all the phases (1), (2) and (3): testing any value in any field under any execution step by doing some programming in a supported language like Java, PHP, CSharp, Ruby, Perl, Python.
Following is the steps to do all these phases (1), (2) and (3):
Create list of your HTML fields so that you could easily iterate through them. If your HTML fields are not structurized enough (legacy reason), think of adding a new attribute that contains a specific id, e.g. selenium-id to your HTML element, to (1) simplify XPath formation, (2) speed up XPath resolution and (3) to avoid translation hassle. While choosing the value for these newly added selenium-id, you are free to help iterating while fuzzing by (a) using consecutive numbers, (b) using names that forms a consistency.
Create a random variable to control the step, say rand_step
Create a random variable to control the field, say rand_field
Eventually, create a random variable to control the value entered into a certain field, say rand_value.
Now, inside your fuzzing algorithm, iterate first through the values of rand_step, then with each such iteration, iterate through rand_field, then finally iterate through rand_value.
That said, fuzz testing helps to scan your whole application's use case values space after a limited execution time. It is said that "a plague of new vulnerabilities emerge that affected popular client-side applications including Microsoft Internet Explorer, Microsoft Word and Microsoft Excel; a large portion of these vulnerabilities were discovered through fuzzing"
But fuzz testing does not come without drawback. One if which is the ability to reproduce a test case given all those randomness. But you could easily overcome this limitation by either doing one of the following:
Generating the test cases before hand in a batch file to be used in a certain period of time, and apply this file gradually
Generating the test cases on the fly, together with logging down those cases
Logging down only the failed cases.
To answer more on if Selenium is black or white box.
Definitions about black-box and white-box
Black box: checks if one box (usually the whole app) delivers the correct outputs while being fed with inputs. Theoretically, your application is bug free if ALL possible input-output pairs are verified.
White box: checks the control flow of the source. Theoretically, your application is bug free if ALL execution paths are visited without problem.
But in real life, you cannot do ALL input-output pairs, nor ALL execution paths, because you always have limited resources in
Time
Money
People
With selenium: you mimic the user by entering a value or do a certain click on a web application, and you wait if the browser gives you the behavior you want. You don't know and don't care how the inner functionality of the web application actually work. That's why a typical Selenium testing is black-box testing
Fairly new to BDD and RSpec, and I'm really curious as to what people typically do when writing their RSpec tests/examples, specifically as it relates to positive and negative tests of the same thing.
Take for example validation for a username and the rule that a valid username contains only alphanumeric characters.
The affirmative/positive test would be something like this:
it "should be valid if it contains alphanumeric characters"
username = 'abc123'
username.should be_valid
end
While the negative test would be something like this:
it "should be invalid if it contains non-alphanumeric characters"
username = '%as.12-'
username.should_not be_valid
end
Would you write one test, but not the other? Would you write both? Would you put them together in the same test? I've seen examples where people do any of the above, so I was wondering if there is a best practice and if so, what is it?
Example of writing both positive and negative test:
it "should be invalid if it contains non-alphanumeric characters"
username = '%as.12-'
username.should_not be_valid
username = 'abc123'
username.should be_valid
end
I've seen examples of people doing it this way, but I'm honestly not a fan of this approach. I tend to err on the side of keeping things clean and distinct with a single purpose, much like how we should write methods, so I would be more likely to write two separate tests instead of putting them together in one. So is there a best practice that states something of this sort? That examples should test a single feature/behavior from one angle, not all angles.
In that particular case I would write them both for the positive and negative. This is because you really want to make sure that people with valid usernames are allowed to have them and that people who attempt to have invalid usernames can't do that.
Also this way if a username that should / shouldn't be valid comes through as the opposite to what it should be you'll have those tests already and it's just a simple matter of adding a failing test to the correct category in your tests, confirming that the test does indeed fail, fixing it and then confirming that the test then passes.
So yes, test for both in this case. Not simply one or the other.
I find in any situation like this, it can help to realise that what you're doing isn't really testing. You're providing examples of how / why to use the class and some descriptions of its behavior. If you need more than one example to anchor valuable behavior, I think it's OK to include both.
So, for instance, if I was describing the behavior of a list, I'd have two examples to describe "The list should tell me if it's empty". Neither the empty example nor the full example are valuable on their own.
On the other hand, if you have a default situation in which something is valid, followed by a number of exceptional cases, that "valid" situation is independently valuable. There may be other situations you discover later, for instance:
should be invalid for non-alphanumerics
should be invalid for names already taken
should be invalid for numbers only
should be valid for accented letters
etc.
In this case, your behavior has two examples by coincidence, rather than because they form two sides of a valuable aspect of behavior. The valid behavior is valuable on its own. So I would have one example per test in this case, but one aspect of behavior per test generally.
This can apply to other, non-boolean behavior too. For instance, if I'm writing ATM software, I would want to both provide cash and debit the account. Neither behavior is valuable without the other.
"One assertion per test" is a great rule of thumb. I find it can be overused, and sometimes there's a case for "one aspect of behavior per test" instead. This isn't one of those cases, but I thought it worth mentioning anyway.
This pattern is usually known as "One Assertion Per Test":
http://blog.jayfields.com/2007/06/testing-one-assertion-per-test.html