What is an error-guessing test case? - testing

I would need a simple explanation on what a error-guessing test case it. Is it dangerous to use? I would appreciate an example.
Best regards,
Erica

Error guessing is documented here: http://en.wikipedia.org/wiki/Error_guessing
It's a name for something that's very common -- guessing where errors might occur based on your previous experience.
For example you have a routine that calculates whether a value inputted by a user from a terminal is a prime number:
You'd test the cases where errors tend to occur:
Empty input
Values that are not integers (floating point, letters, etc)
Values that are boundary cases 2, 3, 4
etc.
I would assume that every tester/QA person would be asked questions like this during an interview. It gives you a chance to talk about what procedures you've used in the past during testing.
I think the method goes like this:
1. Do formal testing
2. Use knowledge gained during formal testing about how the system works to make a list of places where defects might be
3. Design tests to verify whether those defects exist.
By its nature this process is very ad-hoc and unstructured.

Related

Does JBehave based negative-scenarios grow exponentially?

In behavior based testing, it looks like the number of error scenarios grow exponentially.
As per Aslak Hellesøy, BDD was created to combine automated acceptance tests, functional requirements and software documentation.
In 2003 I became part of a small clique of people from the XP community who were exploring better ways to do TDD. Dan North named this BDD. The idea was to combine automated acceptance tests, functional requirements and software documentation into one format that would be understandable by non-technical people as well as testing tools.
Software development teams use JBehave as a tool for BDD testing (thanks to Dan North).
As there can be a lot of possible negative options; it looks like the number of negative scenarios in a JBehave test suite can grow a lot in numbers. Time taken to run test suite as well as to modify the product increases with these kind of growing scenarios. Specially, I feel that it is becoming hard to maintain as a documentation of the product.
I am not exactly sure whether this is an abuse of BDD/JBehave concepts due to misunderstandings from different teams; or may be that is the way it should be.
Let me explain this concern with an example.
Say an application has a behavior to order an item via a REST service.
PUT /order
{
// JSON body with 3 mandatory parameters and 2 optional parameters
}
Happy scenario
Invoke REST endpoint with correct values for all 3 mandatory parameters
Invoke REST endpoint with correct values for all 5 parameters
Negative scenarios
There are a lot of negative scenarios that we can come up with.
Input value based scenarios
Mandatory parameter 1 is set to null, with correct values for other two mandatory parameters (3 possible scenarios with each mandatory parameter)
Mandatory parameter 1 is set to empty, with correct values for other two mandatory parameters (3 possible scenarios with each mandatory parameter)
Mandatory parameter 1 is set to a value in invalid format, with correct values for other two mandatory parameters (3 possible scenarios with each mandatory parameter)
Mandatory parameter 1 & 2 are set to null, with correct value for other mandatory parameter (2 possible scenarios)
Likewise, we can write 3^3 scenarios just for those three parameters; which grows exponentially with the number of parameters.
Then we can combine, optional parameters also into the equation and come up with more scenarios (say optional parameter with null, empty and invalid-format values).
Payment ability based scenarios
Based on available money, there will be scenarios.
Delivery location based scenarios
Based on delivery possibilities, there will be scenarios.
Question/Concern
I would like to learn more on whether all these negative scenarios (+more) should be part of JBehave based test suite? If that is the case, any advice/thoughts on how to make it more maintainable?
It helps a lot to know what the tested application does internally in its own validation processes, specifically the order of validation.
In a simplified example of three required parameters, you really only need three scenarios: one for each parameter. If you know that the application will fail if parameter one is invalid, you don't need to check that again when you test parameter two in another scenario since the second parameter would never be validated upon failure of the first, so instead of three times three, you simply have three:
1) invalid, valid, valid.
2) valid, invalid, valid.
3) valid, valid, invalid.
That is, unless, the application DOES check all three parameters and reports accordingly that one or more parameters were invalid. Speaking as a developer who now does automation, I can tell you that unless I thought multiple invalid parameters were a high probability, I would only check parameters one at a time and fail out with an error upon the first invalid parameter. Having written accounting software, there were times where it was logical to validate all parameters and report accordingly, but that was the exception rather than the rule. If you know what the application is checking, and in what order, you can write better test scripts, but I realize that is not always possible.
There is still the question of the seemingly limitless kinds of invalid data, so even in my simplified example, you could still have lots of tests, but in that situation it can be dealt-with using parameters of invalid values. You could still limit it to just three scenarios, each having any number of invalid parameters to test.
I hope I understood your question correctly and offered some useful information.

In “Given-When-Then” style BDD tests, is it OK to have multiple “When”s conjoined with an “And”?

I read Bob Martin's brilliant article on how "Given-When-Then" can actual be compared to an FSM. It got me thinking. Is it OK for a BDD test to have multiple "When"s?
For eg.
GIVEN my system is in a defined state
WHEN an event A occurs
AND an event B occurs
AND an event C occurs
THEN my system should behave in this manner
I personally think these should be 3 different tests for good separation of intent. But other than that, are there any compelling reasons for or against this approach?
When multiple steps (WHEN) are needed before you do your actual assertion (THEN), I prefer to group them in the initial condition part (GIVEN) and keep only one in the WHEN section. This kind of shows that the event that really triggers the "action" of my SUT is this one, and that the previous one are more steps to get there.
Your test would become:
GIVEN my system is in a defined state
AND an event A occurs
AND an event B occurs
WHEN an event C occurs
THEN my system should behave in this manner
but this is more of a personal preference I guess.
If you truly need to test that a system behaves in a particular manner under those specific conditions, it's a perfectly acceptable way to write a test.
I found that the other limiting factor could be in an E2E testing scenario that you would like to reuse a statement multiple times. In my case the BDD framework of my choice(pytest_bdd) is implemented in a way that a given statement can have a singular return value and it maps the then input parameters automagically by the name of the function that was mapped to the given step. Now this design prevents reusability whereas in my case I wanted that. In short I needed to create objects and add them to a sequence object provided by another given statement. The way I worked around this limitation is by using a test fixture(which I named test_context), which was a python dictionary(a hashmap) and used when statements that don't have same singular requirement so the '(when)add object to sequence' step looked up the sequence in the context and appended the object in question to it. So now I could reuse the add object to sequence action multiple times.
This requirement was tricky because BDD aims to be descriptive. So I could have used a single given statement with the pickled memory map of the sequence object that I wanted to perform test action on. BUT would it have been useful? I think not. I needed to get the sequence constructed first and that needed reusable statements. And although this is not in the BDD bible I think in the end it is a practical and pragmatic solution to a very real E2E descriptive testing problem.

How to functionally test an extremely complex system?

I've got a legacy system that processes extremely complex data that's changing every second. The modularity of the system is quite poor so I can't split the business logic into smaller modules to ease functional testing.
The actual test system is: "close your eyes click and pray", which is not acceptable at all. I want to be confident on the changes we commit on the code.
What are the test good practices, the bibles to read, the changes to operate, to increase confidence in such a system.
The question is not about unit testing, the system wasn't designed for that and it takes too much time to decouple, mock and stub all the dependencies and most of all, we sadly don't have the time and budget for that. I don't want to a philosophic debate about functional testing: I want facts that work in real life.
It sounds like you have yourself a black box as regards testing.
http://en.wikipedia.org/wiki/Black-box_testing
To put it simply, it's horrible, but may be all you can do if you can't isolate the system in any way.
You need to insert known data into your system and compare the result with the known output.
You really need known data & output for
normal values - normal data - you'll find out that it can at least seem to do the right thing
erroneous values - spelling errors, invalid values - so you know that it will tell you if the input is rubbish
out of range - -1 on signed integers, values greater than 2.7 billion (assuming 32bit), and so on - so you know it won't crash out on seriously mis-inputted or corrupted data
dangerous - input that would break the SQL, simulate SQL injection
Lastly make sure that all errors are being carefully handled rather than getting logged and the bad/corrupt/null value getting passed on through the system.
Any processes you can isolate and test that way will make debugging easier, as black box testing can't tell you where the error occurred. This means then you need to diagnose the errors based on what happened, more in the style of House MD than a normal debugging session.
Once you have the different data types listed above, you can test all changes in isolation with them, and then in the system as a whole. Over time as you eventually touch most aspect of the system, you'll have test cases for all areas, and be able to say where failure was most likely to have occurred more easily.
Also: make sure you put tracers in your known data so you don't accidentally indicate a stockmarket crash when you're testing the range limits on a module, so you can take it out of the result flow before it ends up on a CEO's desk.
I hope that's some help
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052 seems to be the book for these situations.

What is the best practice for writing RSpec tests/examples? Write examples to test the positive/affirmative, the negative, or both?

Fairly new to BDD and RSpec, and I'm really curious as to what people typically do when writing their RSpec tests/examples, specifically as it relates to positive and negative tests of the same thing.
Take for example validation for a username and the rule that a valid username contains only alphanumeric characters.
The affirmative/positive test would be something like this:
it "should be valid if it contains alphanumeric characters"
username = 'abc123'
username.should be_valid
end
While the negative test would be something like this:
it "should be invalid if it contains non-alphanumeric characters"
username = '%as.12-'
username.should_not be_valid
end
Would you write one test, but not the other? Would you write both? Would you put them together in the same test? I've seen examples where people do any of the above, so I was wondering if there is a best practice and if so, what is it?
Example of writing both positive and negative test:
it "should be invalid if it contains non-alphanumeric characters"
username = '%as.12-'
username.should_not be_valid
username = 'abc123'
username.should be_valid
end
I've seen examples of people doing it this way, but I'm honestly not a fan of this approach. I tend to err on the side of keeping things clean and distinct with a single purpose, much like how we should write methods, so I would be more likely to write two separate tests instead of putting them together in one. So is there a best practice that states something of this sort? That examples should test a single feature/behavior from one angle, not all angles.
In that particular case I would write them both for the positive and negative. This is because you really want to make sure that people with valid usernames are allowed to have them and that people who attempt to have invalid usernames can't do that.
Also this way if a username that should / shouldn't be valid comes through as the opposite to what it should be you'll have those tests already and it's just a simple matter of adding a failing test to the correct category in your tests, confirming that the test does indeed fail, fixing it and then confirming that the test then passes.
So yes, test for both in this case. Not simply one or the other.
I find in any situation like this, it can help to realise that what you're doing isn't really testing. You're providing examples of how / why to use the class and some descriptions of its behavior. If you need more than one example to anchor valuable behavior, I think it's OK to include both.
So, for instance, if I was describing the behavior of a list, I'd have two examples to describe "The list should tell me if it's empty". Neither the empty example nor the full example are valuable on their own.
On the other hand, if you have a default situation in which something is valid, followed by a number of exceptional cases, that "valid" situation is independently valuable. There may be other situations you discover later, for instance:
should be invalid for non-alphanumerics
should be invalid for names already taken
should be invalid for numbers only
should be valid for accented letters
etc.
In this case, your behavior has two examples by coincidence, rather than because they form two sides of a valuable aspect of behavior. The valid behavior is valuable on its own. So I would have one example per test in this case, but one aspect of behavior per test generally.
This can apply to other, non-boolean behavior too. For instance, if I'm writing ATM software, I would want to both provide cash and debit the account. Neither behavior is valuable without the other.
"One assertion per test" is a great rule of thumb. I find it can be overused, and sometimes there's a case for "one aspect of behavior per test" instead. This isn't one of those cases, but I thought it worth mentioning anyway.
This pattern is usually known as "One Assertion Per Test":
http://blog.jayfields.com/2007/06/testing-one-assertion-per-test.html

FIFO semaphore test

I have implemented FIFO semaphores but now I need a way to test/prove that they are working properly. A simple test would be to create some threads that try to wait on a semaphore and then print a message with a number and if the numbers are in order it should be FIFO, but this is not good enough to prove it because that order could have occurred by chance. Thus, I need a better way of testing it.
If necessary locks or condition variables can be used too.
Thanks
What you describe with your sentence "but this is not good enough to prove it because that order could have occurred by chance" is somehow a known dilema.
1) Even if you have a specification, you can not ensure that the specification match your intention. To illustrate this I will take an example from "the limit of correctness". Let's consider a specification for a factorization function that is:
Compute A and B such as A * B = C
But it's not enough as you could have an implementation that returns A=1 and B=C. Adding A,B != 1 can still lead to A=-1 and B=-C, so the only correct specification must state A,B>1. That's just to illustrate how complicated it can be to write a specification that match the real intention.
2) Even having proved an algorithm, still doesn't mean the implementation is correct in practice. This is best illustrated with this quote from Donald Knuth:
Beware of bugs in the above code; I
have only proved it correct, not tried
it.
3) Testing can only reveal the presence of bug, not their absence. This quote goes back to Dijkstra:
Testing can be used to show the
presence of bugs but never to show
their absence.
Conclusion: you are doomed and you will never be 100% sure that your code is correct according to its intent! But stuff aren't that bad. Having a high confidence about the code is usually enough. For instance, if using multiple threads is still not enough for you, you can decide to use fuzzing as well so as to randomize the test execution even more. If your tests always pass, well, you can be pretty confident that your code is good.
because that order could have occurred by chance.
You can run the test a few times, e.g. 10, and test that each time the order was correct. This will ensure that it happened not by chance.
P.S. Multiple threads in a unit test is usually avoided