How to develop regression tests for a calculation engine - testing

I'm on a team developing a financial information web app. We haven't written many automated tests for it yet, so we've decided to add regression tests to the most critical parts of our program. I'm very new to automated testing, though, so I'm not entirely sure how I should go about writing the tests.
This post is long, so here's the tl;dr question: How can I write a regression test that checks to see if certain calculation is working? I don't just want to test the calculation, though - I also want to know if any of the components the calculation depends on to give it its inputs break. I don't need to know which component broke in particular, just that something's not working. What approach should I use?
This is our situation: We developed the app using a layered architecture, like this:
Views
|
V
Logic Managers <--> Financial Calculation Engines
|
V
Data Accessors
|
V
Database
We've determined that the calculation engines are the parts of our program that most need to have a regression test suite. These components contain the calculations and algorithms that we use to process raw financial data into useful results. Their corresponding Managers use them by calling their public methods, which accept raw financial data as parameters. When the engine methods return, they send back an object that contains processed financial results. The managers, meanwhile, get the raw financial data from the data accessors, which in turn get data from the database.
We decided we want to know as soon as a financial calculation "breaks" so that we know the bug is somewhere in whichever pieces of the program have been touched since the last run of the tests. This would let us use continuous testing to protect us from having the engines producing wrong results and having no idea where to look.
When we thought about what this means, we realized that adding a unit test to each of the engines isn't enough. Let's say, for example, that an erroneous change to the data accessors means that they start pulling the wrong data. This data would then be sent up through the manager to the engine, which would produce the wrong results. However, the engine's algorithms themselves would still be working perfectly, so the unit test would still pass. This means that when we noticed the wrong numbers being generated, we would have no way of knowing when the bug was introduced, making it more difficult to track down and fix.
Instead, we would like to make regression tests that are able to pick up as soon as a bug appears anywhere that would cause the final results the engines output to be incorrect, even if the problem is that the wrong data is sent to the engines and not that the engines themselves have issues. When these tests fail, they wouldn't tell us where the problem is, but if we're continuously testing, we'll know as soon as a bug is checked in and have a small set of changes to look through to fix it.
So that's what we want to do. Unfortunately, we don't know how to create these tests. What approaches or patterns are useful for writing these types of regression tests?

Just a hint: you should check every part of the Financial Calculations Engine with the same inputs every time, and the objects returned should be identical every time.
Test separately the Data Accessors, with the same logic: same input, same output.
To do so, you need to mock some parts of the system (eg. mock the data accessors to always return the same set of data).
Having separate unit tests fore each part also locates the bug with more precision.
A couple of links to get into the idea:
http://www.ibm.com/developerworks/library/j-mocktest/index.html
http://www.slideshare.net/joewilson123/unit-testing-and-mocking
There are a lot of mocking frameworks around that can help you code the tests, like Mockito for Java projects.

Related

Test-Automation using MetaProgramming

i want to learn test automation using meta programming.i googled it could not find any thing.can anybody suggest me some resources where can i get info about "how to use Meta Programming for making test automation easy"?
That's a broad topic and not a lot has been written about it, because of the "dark corners" of metaprogramming.
What do you mean by "metaprogramming"?
As background, I consider metaprogramming to be any activity in which a tool (which we call a "metaprogramming tool") is used to inspect or modify the application software to achieve some effect.
Many people consider "reflection" to be a kind of metaprogramming; other consider (C++-style) templates to be metaprogramming; some suggest aspect-oriented programming.
I sort of agree but think these are weak versions of what you want, because each has severe limits on what it can see or do to source code. What you really want is a metaprogramming tool that has access to everything in your source program (yes, comments too!) Such tools are called Program Transformation Systems (PTS); they work by parsing the source code and operating on the parsed representation of the program. (I happen to build one of these, see my bio). PTSes can then analyze the code accurate, and/or make reliable changes to the code and regenerate valid source with the changes. PS: a PTS can implement all those other metaprogramming techniques as special cases, so it is strictly more general.
Where can you use metaprogramming for testing?
There are at least 2 areas in which metaprogramming might play a role:
1) Collection of information from tests
2) Generation of tests
3) Avoidance of tests
Collection.
Collection of test results depends on the nature of tests. Many tests are focused on "is this white/black box functioning correctly"? Assuming the tests are written somehow, they have to have access to the box under test,
be able to invoke that box in a realistic ways, determine if the result is correct, and often tabulate the results to that post-testing quality assessments can be made.
Access is the first problem. The black box to be tested may not be easily accessible to a testing framework: driven by a UI event, in a non-public routine, buried deep inside another function where it hard to get at.
You may need metaprogramming to "temporarily" modify the program to provide access to the box that needs testing (e.g., change a Private method to Public so it can be called from outside). Such changes exist only for the duration of the test project; you throw the modified program away because nobody wants it for anything but the test results. Yes, you have to ensure that the code transformations applied to make things visible don't change the program functionality.
The second problem is exercising the targeted black box in a realistic environment. Each code module runs in a world in which it assumes data and the environment are "properly" configured. The test program can set up that world explicitly by making calls on lots of the program elements or using its own custom code; this is usually the bulk of a test routine, and this code is hard to write and fragile (the application under test keeps changing; so do its assumptions about the world). One might use metaprogramming to instrument the application to collect the environment under which a test might need to run, thus avoiding the problem of writing all the setup code.
Finally, one might want to record more than just "test failed/passed". Often it is useful to know exactly what code got tested ("test coverage"). One can instrument the application to collect what-got-executed data; here's how to do it for code blocks: http://www.semdesigns.com/Company/Publications/TestCoverage.pdf using a PTS. More sophisticated instrumentation might be used to capture information about which paths through the code have been executed. Uncovered code, and/or uncovered paths, show where tests have not been applied and you arguably know nothing about what the program does, let alone whether it is buggy in a straightforward way.
Generation of tests
Someone/thing has to produce tests; we've already discussed how to produce the set-up-the-environment part. What about the functional part?
Under the assumption that the program has been debugged (e.g, already tested by hand and fixed), one could use metaprogramming to instrument the code to capture the results of execution of a black box (e.g., instance execution post-conditions). By exercising the program, one can then produce (by definition) "correctly produces" results which can be transformed into a test. In this way, one might construct a huge variety of regression tests for an existing program; these will be valuable in verifying the further enhancements to the program don't break most of its functionality.
Often a function has qualitatively different behaviors on different ranges of input (e.g., for x<10, produced x+1, else produces x*x). Ideally one would like to provide a test for each qualitively different results (e.g, x<10, x>=10) which means one would like to partition the input ranges. Metaprogrammning can help here, too, by enumerating all (partial) paths through module, and providing the predicate that controls each path.
The separate predicates each represent the input space partition of interest.
Avoidance of Tests
One only tests code one does not trust (surely you aren't testing the JDK?) Any code consructed by a reliable method doesn't need tests (the JDK was constructed this way, or at least Oracle is happy to have you beleive it).
Metaprogramming can be used to automatically generate code from specifications or DSLs, in relaible ways. Such generated code is correct-by-construction (we can argue about what degree of rigour), and doesn't need tests. You might need to test that DSL expression achieves the functionaly you desired, but you don't have to worry about whether the generated code is right.

What tools exist for managing a large suite of test programs?

I apologize if this has been answered before, but I'm having trouble finding a tool that fits my needs.
I have a few dozen test programs, but each one can be run with a large number of parameters. I need to be able to automatically run sweeps of many of the parameters across all or some of the test programs. I have my own set of tools for running an individual test, which I can't really change, but I'm looking for a tool that would manage the entire suite.
Thus far, I've used a home-grown script for this. The main problem I run across is that an individual test program might take 5-10 parameters, each with several values. Although it would be easy to write something that would just do a nested for loop and sweep over every parameter combination, the difficulty is that not every combination of parameters makes sense, and not every parameter makes sense for every test program. There is no general way (i.e., that works for all parameters) to codify what makes sense and what doesn't, so the solutions I've tried before involve enumerating each sensible case. Although the enumeration is done with a script, it still leads to a huge cross-product of test cases which is cumbersome to maintain. We also don't want to run the giant cross-product of cases every time, so I have other mechanisms to select subsets of it, which gets even more cumbersome to deal with.
I'm sure I'm not the first person to run into a problem like this. Are there any tools out there that could help with this kind of thing? Or even ideas for writing one?
Thanks.
Adding a clarification ---
For instance, if I have parameters A, B, and C that each represent a range of values from 1 to 10, I might have a restriction like: if A=3, then only odd values of B are relevant and C must be 7. The restrictions can generally be codified, but I haven't found a tool where I could specify something like that. As for a home-grown tool, I'd either have to enumerate the tuples of parameters (which is what I'm doing) or put or implement something quite sophisticated to be able to specify and understand constraints like that.
We rolled our own, we have a whole test infrastructure. It manages the tests, has a number of built in features for allowing the tests to log results, the logs are managed by the test infrastructure to go into a searchable database for all kinds of report generation.
Each test has a class/structure that has information about the test, name of test, author, and a variety of other tags. When running a test suite you can run everything or run everything with a certain tag. So if you want to only test SRAM you can easily run only tests tagged sram.
Our tests are all considered either pass or fail. pass/fail criteria is determined by the author of the individual test, but the infrastructure wants to see either pass or fail. You need to define what your possible results are, as simple as pass/fail or you might want to add pass and keep going, pass but stop testing, fail but keep going, and fail and stop testing. Stop testing meaning if there are 20 tests scheduled and test 5 fails then you stop you dont go on to 6.
You need a mechanism to order the tests which could be alphabetical but it might benefit from a priority scheme (must perform the power on test before performing a test that requires the power to be on). It may also benefit from a random ordering some tests may be passing due to dumb luck because a test before them made something work, remove that prior test and this test fails. or vice versa this test passes until it is preceeded by a specific test and those two dont get along in that order.
To shorten my answer I dont know of an existing infrastructure, but I have built my own and worked with home built ones that were tailored to our business/lab/process. You wont hit a home run the first time, dont expect to. but try to predict a managable set of rules for individual tests, how many types of pass/fail return values it can return. The types of filters you want to put in place. The type of logging you may wish to do and where you want to store that data. then create the infrastructure and the mandantory shell/frame for each test, then individual testers have to work within that shell. Our current infrastructure is in python which lent itself to this nicely, and we are not restricted to only python based tests we can use C or python and the target can run whatever languages/programs it can run. Abstraction layers are good, we use a simple read/write of an address to access the unit under test, and with that we can test against a simulation of the target or against real hardware when the hardware arrives. We can access the hardware through a serial debugger, or jtag or pcie, and the majority of the tests dont know or care because the are on the other side of the abstraction.

What is a good method of doing TDD with legacy Delphi code having embedded SQL

I have to take some legacy Delphi code pointing to a database and make it support a new, better, database having a completly different schema. The updated database has the same data. It has a combination of stored procedures and embedded SQL.
Is there a good Test driven development technique that will help make sure I don't break anything? This code has amost no unit tests and I need to make changes to a lot of hard coded SQL.
Just running after every change sounds error prone and time consuming. I love the idea of doing TDD or BDD, just not sure how to do it.
It's good that you want to get into unit testing, but I'd like to caution you against taking it on over-zealously.
Adding unit tests to legacy code is a major undertaking, and it's almost always totally unfeasible to halt other work just to add test cases. Also, unless you already have experience in TDD, that learning curve itself can prove a troublesome hurdle to overcome.
However, if you persevere, and take things one step at a time, your efforts will be rewarded in the end.
The problems you're likely to encounter:
Legacy applications are usually very difficult to 'retro-fit' with test cases. This is because the code wasn't written with testability in mind.
Many routines are doing too many things, so tests have to consider large numbers of side-effects.
Code is not properly self-contained, so setting up pre-conditions for a test is a lot of work.
Entry points for testing/checking behaviour are often missing because they weren't needed for production code; and therefore weren't added in the first place.
Code often relies on global state somewhere. Either directly, or via Singleton's. This global state (regardless of where it lies) plays havoc on your test cases.
Unit testing of databases is inherently more difficult than other kinds of unit testing. The reason for this is that test cases don't like global state - and databases are effectively massive containers of global state. Problems manifest themselves in many ways:
If you're using IDENTITY columns, Auto Inc or number generators of any form: These either result in a specific difference between each test run, or you need a way to reset those numbers between tests.
Databases are slow. Once you've built up a large number of test cases it will be impractical to run all tests between every change. (One of my Db Test suites takes almost 10 minutes to run.)
If your database generates date/time values, these can also complicate testing. Especially if the database runs on a different machine.
Database testing is complicated by the fact that there are two aspects to the database: Its schema, and its data. So if you wish to test a new/changed stored procedure (part of the schema), it needs appropriate changes to the data and possibly to other aspects of the schema (such as tables/views).
Even without the above extra complications, there are the 'normal problems' you'll have to deal with.
Global state often crops up unexpectedly in some awkward places. Consider Now() which returns a TDateTime. It uses global state: the current date-time. If you have time/date based rules in your system, those rules may return different results depending on when your tests are run. Unless you find an effective way to deal with this challenge, you'll have a number of "erratic" test cases.
Writing test cases is a fundamentally different programming paradigm to what most developers are used to. It can be extremely difficult to break old habits. The style of test case code is almost declarative: Given this, When I do This, I expect this to have happened. Test cases need to be simple and clear about what they're trying to achieve.
The learning curve can be tricky. Initially you may find yourself taking 3 times as long to write code if unfamiliar with test cases. And even though it will eventually improve (possibly even to the point where you're faster than you used to be with unstructured and haphazard testing) - other people around you will likely express frustration. (Not cool if it's your boss.)
Hopefully I haven't discouraged you, I do have some practical advice:
As the saying goes "Don't bite off more than you can chew."
Be prepared to start out slow. For the time being, carry on with most of your work in a way that's familiar to you. But force yourself to write 1 or 2 test cases every day. As you get more comfortable, you can increase this number.
Try stick to the "tried and tested principles"
The TDD work flow is : first write the test and ensure the test fails. I know it is difficult to stick to the habit, but the principle serves a very important purpose. It's a level of confirmation that your test case proves the bug / missing feature. Far too often I've seen test case code that would pass with/without the production change - making the test somewhat useless.
For your database tests you'll need to establish a framework that works for you.
First, you'll need a mechanism of getting your database to a 'base-state'. One from which all your tests should be able to pass - no matter what order or how many times they are run. Typically this will involve some sort of Reset between tests (but it needs to be quite quick). Second, you'll need an easy way to update the schema of your database to what is expected by production code.
Initially you'll only want to test new features, or bug fixes.
Avoid the temptation to test everything. Over time, your test case coverage will increase. Once your framework and patterns have been established, then you might get a chance to start adding tests just to increase coverage.
Refactoring existing code.
As you become familiar with testing, you'll learn about the coding habits that make testing more difficult. You'll probably find many such problems in legacy code. Such code will not be testable as is. You may need to refactor your code before you can even test it. Obviously this is not ideal, because you'd rather have tests that always pass to prove that your changes haven't broken anything. A good book on refactoring will give you some techniques you can use that will change the structure of your code without changing its behaviour.
Testing existing code.
When writing a test for an existing routine, look at the code and determine each of the inputs that can effect different behaviour. E.g. When there's an if statement, something will cause the condition to evaluate to True, and something else to False. At a minimum, you'll want a test for each permutation.
In your place I would use DUnit to create a unit test project. For each of the entities I would write testing methods that would run the old and new sentences and then write methods to compare the results.
I would write a TTestCase class named, let´s say TMyTestCase, and add some helper methods to it, then would create my new test classes as subclasses from TMyTestCase.
The idea of the ancestor class is to provide common functionality that makes it easier to write the tests (the comparison methods, for intance) in order the enhance productivity and comfort.
You can start building a database simulator. Connect it instead of the old one and see what it needs to do. Lot of work though

In functional testing, should I compare all tabular data rendered in the browser with the one coming from the DB?

I'm working on a test plan for a website where some tests are taking the following path:
Hit the requested URI and get the data rendered inside some table(20 rows per page).
Make a database query to get the data that is supposed to be rendered in that table.
Compare the 2 data row by row, they should match.
Is that a correct way of doing functional testing? If that request was an Ajax request, what will be the answer also? Would the answer differ for integration testing?
I have some reason that makes me believe that this is wrong somehow.... still need your opinions guys!
Yes, this could be a productive test. Either you have a fixed data set or you don't.
If you have a fixed data set, this is much easier to test, because all you're doing is comparing against a fixed output.
If you don't have a fixed data set, then you need to duplicate the business logic, effectively duplicating the work already done by the developer. Then you have two sets of logic to maintain.
The second is the best approach because you get two ways of doing the same thing, effectively a peer review of the specification and code. It's also very expensive in terms of time and resources, which is why most people choose to have a fixed data set.
To answer your question, if your business logic in the query is simple, then you can get a test very easily. However, the value that the test brings isn't great, because you aren't testing very much.
If the business logic is complex, you are getting more value from the test, but it's going to be harder to maintain in the long term.
For me, what your test does bring is a simple integration test that proves that the system reads correctly from the database, and displays the data correctly. This is a good test, even better if it is automated.
This seems fine for functional testing. Integration testing in my mind has to do with the testing of different technologies or components that are supposed to work together which is generally broader than functional testing. But of course this sort of testing could also be considered integration testing, depending on how your application is put together and where the testing is happening in the lifecycle of your development. For example it may be that in order for this site to work you have to put together a few components that were developed independently; this might be one of the tests to validate that the integration works.
Don't see how this being Ajax or not has anything to do with making the answer different.
I will likely be a dissenting opinion here, but I don't consider this to be a productive test. What you are doing is simply duplicating the code which produces the page. And any time you introduce duplicated code (even across departments) you'll be looking at defects cropping up long-term.
It is far better to load the DB with known data (either through the app, or directly) and then check that the output matches what you'd expect. This also ensures that your DB layer, or DB itself, hasn't modified the data in a way you do not expect.
That is:
Load known data (preferably through the app itself)
Load the requested URI
Check that displayed data matches your known data
This kind of test could be good for testing a large set of data with relatively little tester effort if there is not much developer logic between the database and the display to the end user. Our team has done this on a number of occasions, and it is especially useful for running large quantities of real production data through our tests to be sure that actual scenarios are handled as expected. Do make sure you do at least a little fixed input testing for rare scenarios that might be especially likely to be handled differently in the DB and on the web page - null values, special characters, and other oddities.
Personally, I would call this "integration testing", since you are testing the integration of the DB and the web site, and not "functional testing". For "functional testing", I'd probably want to make a mock of the datasource (e.g., the database) that will provide pre-written sets of data in the format you expect.
Having said that, if I had high confidence in the validity of the DB data and if the logic between the DB query and the web page display was very small and low-risk, I would probably not bother with the mock and would let the integration test cover the functionality as well. I don't know that testing the functionality and integration separately would be a big quality win in this case, and there are likely better things you could do with the available testing time. If there is a lot of logic around this data, you should probably test the integration separately from the functionality. Additional integration testing would probably include things like, "What if the database can't be reached?" and "What if the database is slow?".
While this technique will work with Ajax, make sure your testing tools will work with Ajax. Specifically, think about how you will capture the database query results and how you will gather the results displayed on the web page.
I'm assuming that the validity of the data in the query is being tested elsewhere, since you mentioned that this was just one type of test in the test plan. I'm also just discussing integration with the database and this report and not other features or components, and not other aspects of testing (performance, security. etc.), since that was the scope of your question.

How to use automation for testing application involving highly complex calculations?

I want to following things for testing a application involving complex calculations:
How to use test automation tools for testing calculations?(using automation tools like QTP or open source tools)
How to decide coverage while testing calculations, how to design test cases?
Thanks in advance,
Testmann
We had to test some really complex calculations in an application we built. To do this we used a tool call FitNesse, which is a wiki test harness (and open source too). It works really well when you provide it data in a table style format.
We had some code in C# that perform some VERY complex calculations. So what we did is wrote a test harness in FitNesse, and then we supplied it with a lot of test data. We worked very hard to cover all cases, so we utilized a sort of internal truth-table to ensure we were getting every possible combination of data input.
The FitNesse test harness has been invaluable to us as the complexity of the calculations has changed over time due to changing requirements. We've been able to ensure the correctness of the calculations because our FitNesse tests act as a very nice regression suite.
Sometimes, you have to estimate the expected conclusion, and then populate the test case from a run of the program.
It's not so much of a mortal sin, as long as you're convinced it's correct. Those tests will then let you know immediately if a code change breaks the code. Also, if you're testing a subset, it's not that big of a stretch of trust.
And for coverage? Cover every branch at least once (that is, any if or loop statements). Cover every threshold, both sides of it (for integer division that would be -1, 0, and 1 as denominators). Then add a few more in for good measure.
To test existing code, you should assume that the code is (mostly) correct. So you just give it some data, run it and record the result. Then use that recorded result in a test case.
When you do the next change, your output should change too and the test will fail. Compare the new result with what you'd have expected. If there is a discrepancy, then you're missing something -> write another test to figure out what is going on.
This way, you can build expertise about an unknown system.
When you ask for coverage, I assume that you can't create coverage data for the actual calculations. In this case, just make sure that all calculations are executed and feed them with several inputs. That should give you an idea how to proceed.