Hard-coded vs Soft-coded expected values for Integration Testing - testing

I understand that in Unit Test, hard-coded is preferred as we don't want to write much code as possible.
For Integration Testing however, is the same principle still applied? For a context, here's a common scenario:
TodoController
TodoService
Serializer/Transformer/Whatever you call it
Here's an imaginary code on how both approach might look like:
Hard Coded
// Arrange
$note = new Note('note123', 'John', 'example message');
$this->noteRepository->save($note);
// Act
$response = $this->json('GET', '/api/notes');
// Assert
$response->seeJsonEquals([
'id' => 'note123',
'author' => 'John',
'message' => 'example message'
]);
Soft Coded (Computed)
// Arrange
$note = new Note('note123', 'John', 'example message');
$this->noteRepository->save($note);
// Act
$response = $this->json('GET', '/api/notes');
// Assert
$noteSerializer = new NoteSerializer();
$response->seeJsonEquals($noteSerializer->serialize($note));
The only catch for the soft-coded approach is that if the Serializer is the problem, the test will still pass because both the controller and expected value uses it.
However, we can solve that by creating another test for the Serializer.
We might have lots of mini-tests but I think it'll us save lots of time compared to hard-coding. If we changed our response structure, the hard-coded tests will need some changes as well but the soft-coded one needs to change in its own test only.
I might be missing something but I already tried to Google it and all I see are always about Unit Tests so thought of asking if same principle is still applied in Integration Tests as well

TL;DR: yes, it makes perfect sense to have integration tests that assume that other testing strategies are sharing the responsibility for detecting mistakes.
I think you'll find that there are two different ideas here, that are getting mixed.
One problem is independent verification. There are lots of tests that you can run to demonstrate that a given solution is internally consistent, but that's not equivalent to to demonstrating that a given solution is correct. The latter usually requires querying the test subject for data and then doing an independent evaluation.
UltimateAnswer lifeTheUniverseAndEverything = deepThought()
// Compare this
assertEquals(new UltimateAnswer(42), lifeTheUniverseAndEverything);
// to this
assertEquals(42, lifeTheUniverseAndEverything.toInt());
What counts as independent? I consider it to be a fuzzy line -- if we had enough tests in place to have some arbitrary number of nines confidence in UltimateAnswer::equals, then it might be fine to treat that verification as independent. On the other hand, I've been burned at least twice by using domain agnostic primitives to "independently" verify that things worked, only to discover I was actually performing a dependent verification, and the tests was failing to catch the bug I expected it to.
A second problem is over fitting -- it is often the case that a number of distinguishable behaviors may all be satisfactory. Example: what should be the result of List.shuffle()? If the tests are intended to describe your requirements, then they are going to be more forgiving than tests which document example behaviors.
Strictly fitted tests are fantastic when your primary activity is refactoring, and you are trying to verify that the change that you made does indeed preserve the precise behavior of your system. They can be lousy when testing a new system with a small core deviation in behavior that shows up everywhere (consider tests that verify output strings, after the date formatting requirements get changed).
To my mind, neither of these concerns is particularly different for "Integration Tests" vs "Unit Tests". Admittedly, part of the problem is that its never particularly clear which definition of these ideas another person is working from.
In most cases, the different kinds of tests have different trade offs. We want our verification to be cost effective. So we're probably going to have a layered testing strategy, where the kinds of checks we implement depend on the context.

In your particular example using of NoteSerializer did not make sense, because your assertions build with same code as implementation.
Would you call this test valuable?
// Arrange
$original = 42;
$expected = $original + 100;
// Act
$actual = $original + 100;
// Assert
this-> assertEquals($expected, $actual);
Problem with using NoteSerializer for building expected value is that, as you already noticed, if serializer is broken - tests will remain green.
Instead you can deserialize received response into a class and compare it to original $note

Related

Is it possible for a program cannot find the failure by using dynamic testing, but have fault?

Is it possible for a program cannot find the failure by using dynamic testing, but have fault? any simple example?
Please help! thanks.
Yes. Testing can only prove the absence of bugs for what you tested. Dynamic testing cannot cover all possible inputs and outputs in all environments with all dependencies.
First is to simply not test the code in question. This can be verified by checking the coverage of your test. Even if you achieve 100% coverage there can still be flaws.
Next is to not check all possible types and ranges of inputs. For example, if you have a function that scans for a word in a string, you need to check for...
The word at the start of the string.
The word at the end of the string.
The word in the middle of the string.
A string without the word.
The empty string.
These are known as boundary conditions and include things like:
0
Negative numbers
Empty strings
Null
Extremely large values
Decimals
Unicode
Empty files
Extremely large files
If the code in question keeps state, maybe in an object, maybe in global variables, you have to test that state does not become corrupted or interfere with subsequent runs.
If you're doing parallel processing you must test any number of possibilities for deadlocks or corruption resulting from trying to do the same thing at the same time. For example, two processes trying to write to the same file. Or two processes both waiting for a lock on the same resource. Do they lock only what they need? Do they give up their locks ASAP?
Once you test all the ways the code is supposed to work, you have to test all the ways that it can fail, whether it fails gracefully with an exception (instead of garbage), whether an error leaves it in a corrupted state, and so on. How does it handle resource failure, like failing to connect to a database? This becomes particularly important working with databases and files to ensure a failure doesn't leave things partially altered.
For example, if you're transferring money from one account to another you might write:
my $from_balance = get_balance($from);
my $to_balance = get_balance($to);
set_balance($from, $from_balance - $amount);
set_balance($to, $to_balance + $amount);
What happens if the program crashes after the first set_balance? What happens if another process changes either balance between get_balance and set_balance? These sorts of concurrency issues must be thought of and tested.
There's all the different environments the code could run in. Different operating systems. Different compilers. Different dependencies. Different databases. And all with different versions. All these have to be tested.
The test can simply be wrong. It can be a mistake in the test. It can be a mistake in the spec. Generally one tests the same code in different ways to avoid this problem.
The test can be right, the spec can be right, but the feature is wrong. It could be a bad design. It could be a bad idea. You can argue this isn't a "bug", but if the users don't like it, it needs to be fixed.
If your testing makes use of a lot of mocking, your mocks may not reflect how thing thing being mocked actually behaves.
And so on.
For all these flaws, dynamic testing remains the best we've got for testing more than a few dozen lines of code.

How to quickly analyse the impact of a program change?

Lately I need to do an impact analysis on changing a DB column definition of a widely used table (like PRODUCT, USER, etc). I find it is a very time consuming, boring and difficult task. I would like to ask if there is any known methodology to do so?
The question also apply to changes on application, file system, search engine, etc. At first, I thought this kind of functional relationship should be pre-documented or some how keep tracked, but then I realize that everything can have changes, it would be impossible to do so.
I don't even know what should be tagged to this question, please help.
Sorry for my poor English.
Sure. One can technically at least know what code touches the DB column (reads or writes it), by determining program slices.
Methodology: Find all SQL code elements in your sources. Determine which ones touch the column in question. (Careful: SELECT ALL may touch your column, so you need to know the schema). Determine which variables read or write that column. Follow those variables wherever they go, and determine the code and variables they affect; follow all those variables too. (This amounts to computing a forward slice). Likewise, find the sources of the variables used to fill the column; follow them back to their code and sources, and follow those variables too. (This amounts to computing a backward slice).
All the elements of the slice are potentially affecting/affected by a change. There may be conditions in the slice-selected code that are clearly outside the conditions expected by your new use case, and you can eliminate that code from consideration. Everything else in the slices you may have inspect/modify to make your change.
Now, your change may affect some other code (e.g., a new place to use the DB column, or combine the value from the DB column with some other value). You'll want to inspect up and downstream slices on the code you change too.
You can apply this process for any change you might make to the code base, not just DB columns.
Manually this is not easy to do in a big code base, and it certainly isn't quick. There is some automation to do for C and C++ code, but not much for other languages.
You can get a bad approximation by running test cases that involve you desired variable or action, and inspecting the test coverage. (Your approximation gets better if you run test cases you are sure does NOT cover your desired variable or action, and eliminating all the code it covers).
Eventually this task cannot be automated or reduced to an algorithm, otherwise there would be a tool to preview refactored changes. The better you wrote code in the beginning, the easier the task.
Let me explain how to reach the answer: isolation is the key. Mapping everything to object properties can help you automate your review.
I can give you an example. If you can manage to map your specific case to the below, it will save your life.
The OR/M change pattern
Like Hibernate or Entity Framework...
A change to a database column may be simply previewed by analysing what code uses a certain object's property. Since all DB columns are mapped to object properties, and assuming no code uses pure SQL, you are good to go for your estimations
This is a very simple pattern for change management.
In order to reduce a file system/network or data file issue to the above pattern you need other software patterns implemented. I mean, if you can reduce a complex scenario to a change in your objects' properties, you can leverage your IDE to detect the changes for you, including code that needs a slight modification to compile or needs to be rewritten at all.
If you want to manage a change in a remote service when you initially write your software, wrap that service in an interface. So you will only have to modify its implementation
If you want to manage a possible change in a data file format (e.g. length of field change in positional format, column reordering), write a service that maps that file to object (like using BeanIO parser)
If you want to manage a possible change in file system paths, design your application to use more runtime variables
If you want to manage a possible change in cryptography algorithms, wrap them in services (e.g. HashService, CryptoService, SignService)
If you do the above, your manual requirements review will be easier. Because the overall task is manual, but can be aided with automated tools. You can try to change the name of a class's property and see its side effects in the compiler
Worst case
Obviously if you need to change the name, type and length of a specific column in a database in a software with plain SQL hardcoded and shattered in multiple places around the code, and worse many tables present similar column namings, plus without project documentation (did I write worst case, right?) of a total of 10000+ classes, you have no other way than manually exploring your project, using find tools but not relying on them.
And if you don't have a test plan, which is the document from which you can hope to originate a software test suite, it will be time to make one.
Just adding my 2 cents. I'm assuming you're working in a production environment so there's got to be some form of unit tests, integration tests and system tests already written.
If yes, then a good way to validate your changes is to run all these tests again and create any new tests which might be necessary.
And to state the obvious, do not integrate your code changes into the main production code base without running these tests.
Yet again changes which worked fine in a test environment may not work in a production environment.
Have some form of source code configuration management system like Subversion, GitHub, CVS etc.
This enables you to roll back your changes

In “Given-When-Then” style BDD tests, is it OK to have multiple “When”s conjoined with an “And”?

I read Bob Martin's brilliant article on how "Given-When-Then" can actual be compared to an FSM. It got me thinking. Is it OK for a BDD test to have multiple "When"s?
For eg.
GIVEN my system is in a defined state
WHEN an event A occurs
AND an event B occurs
AND an event C occurs
THEN my system should behave in this manner
I personally think these should be 3 different tests for good separation of intent. But other than that, are there any compelling reasons for or against this approach?
When multiple steps (WHEN) are needed before you do your actual assertion (THEN), I prefer to group them in the initial condition part (GIVEN) and keep only one in the WHEN section. This kind of shows that the event that really triggers the "action" of my SUT is this one, and that the previous one are more steps to get there.
Your test would become:
GIVEN my system is in a defined state
AND an event A occurs
AND an event B occurs
WHEN an event C occurs
THEN my system should behave in this manner
but this is more of a personal preference I guess.
If you truly need to test that a system behaves in a particular manner under those specific conditions, it's a perfectly acceptable way to write a test.
I found that the other limiting factor could be in an E2E testing scenario that you would like to reuse a statement multiple times. In my case the BDD framework of my choice(pytest_bdd) is implemented in a way that a given statement can have a singular return value and it maps the then input parameters automagically by the name of the function that was mapped to the given step. Now this design prevents reusability whereas in my case I wanted that. In short I needed to create objects and add them to a sequence object provided by another given statement. The way I worked around this limitation is by using a test fixture(which I named test_context), which was a python dictionary(a hashmap) and used when statements that don't have same singular requirement so the '(when)add object to sequence' step looked up the sequence in the context and appended the object in question to it. So now I could reuse the add object to sequence action multiple times.
This requirement was tricky because BDD aims to be descriptive. So I could have used a single given statement with the pickled memory map of the sequence object that I wanted to perform test action on. BUT would it have been useful? I think not. I needed to get the sequence constructed first and that needed reusable statements. And although this is not in the BDD bible I think in the end it is a practical and pragmatic solution to a very real E2E descriptive testing problem.

How to functionally test an extremely complex system?

I've got a legacy system that processes extremely complex data that's changing every second. The modularity of the system is quite poor so I can't split the business logic into smaller modules to ease functional testing.
The actual test system is: "close your eyes click and pray", which is not acceptable at all. I want to be confident on the changes we commit on the code.
What are the test good practices, the bibles to read, the changes to operate, to increase confidence in such a system.
The question is not about unit testing, the system wasn't designed for that and it takes too much time to decouple, mock and stub all the dependencies and most of all, we sadly don't have the time and budget for that. I don't want to a philosophic debate about functional testing: I want facts that work in real life.
It sounds like you have yourself a black box as regards testing.
http://en.wikipedia.org/wiki/Black-box_testing
To put it simply, it's horrible, but may be all you can do if you can't isolate the system in any way.
You need to insert known data into your system and compare the result with the known output.
You really need known data & output for
normal values - normal data - you'll find out that it can at least seem to do the right thing
erroneous values - spelling errors, invalid values - so you know that it will tell you if the input is rubbish
out of range - -1 on signed integers, values greater than 2.7 billion (assuming 32bit), and so on - so you know it won't crash out on seriously mis-inputted or corrupted data
dangerous - input that would break the SQL, simulate SQL injection
Lastly make sure that all errors are being carefully handled rather than getting logged and the bad/corrupt/null value getting passed on through the system.
Any processes you can isolate and test that way will make debugging easier, as black box testing can't tell you where the error occurred. This means then you need to diagnose the errors based on what happened, more in the style of House MD than a normal debugging session.
Once you have the different data types listed above, you can test all changes in isolation with them, and then in the system as a whole. Over time as you eventually touch most aspect of the system, you'll have test cases for all areas, and be able to say where failure was most likely to have occurred more easily.
Also: make sure you put tracers in your known data so you don't accidentally indicate a stockmarket crash when you're testing the range limits on a module, so you can take it out of the result flow before it ends up on a CEO's desk.
I hope that's some help
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052 seems to be the book for these situations.

'assert' vs. 'verify' in Selenium

The checks Selenium performs usually come in two flavours: assertFoo and verifyFoo. I understand that assertFoo fails the whole testcase whereas verifyFoo just notes the failure of that check and lets the testcase carry on.
So with verifyFoo I can get test results for multiple conditions even if one of them fails. On the other hand, one failing check for me is enough to know, that my edits broke the code and I have to correct them anyway.
In which concrete situations do you prefer one of the two ways of checking over the other? What are your experiences that motivate your view?
I would use an assert() as an entry point (a "gateway") into the test. Only if the assertion passes, will the verify() checks be executed. For instance, if I'm checking the contents of a window resulting from a series of actions, I would assert() the presence of the window, and then verify() the contents.
An example I use often - checking the estimates in a jqgrid: assert() the presence of the grid, and verify() the estimates.
I've come across a few problems which were overcome by using
assert*()
instead of
verify*()
For example, in form validations if you want to check a form element, the use of verifyTrue(...); will just pass the test even if the string is not present in the form.
If you replace assert with verify, then it works as expected.
I strongly recommend to go with using assert*().
If you are running Selenium tests on a production system and want to make sure you are logged-in as a test user e.g., instead of your personal account, it is a good idea to first assert that the right user is logged in before triggering any actions that would have unintended effects, if used by accident.
Usually you should stick to one assertion per test case, and in this case the difference boils down to any tear-down code which must be run. But you should probably put this in an #After method anyway.
I've had quite a few problems with the verify*() methods in SeleneseTestBase (e.g. they use System.out.println(), and com.thoughtworks.selenium.SeleneseTestBase.assertEquals(Object, Object) just doesn't do what you expect) so I've stopped using them.