Why cleanup a DB after a test run? - testing

I have several test suites that read and write data from a dedicated database when they are run. My strategy is to assume that the DB is in an unreliable state before a test is run and if I need certain records in certain tables or an empty table I do that setup before the test is run.
My attitude is to not cleanup the DB at the end of each test suite because each test suite should do a cleanup and setup before it runs. Also, if I'm trying to "visually" debug a test suite it helps that the final state of the DB persists after the tests have completed.
Is there a compelling reason to cleanup a DB after your tests have run?

Depends on your tests, what happens after your tests, and how many people are doing testing.
If you're just testing locally, then no, cleaning up after yourself isn't as important ~so long as~ you're consistently employing this philosophy AND you have a process in place to make sure the database is in a known-good state before doing something other than testing.
If you're part of a team, then yes, leaving your test junk behind can screw up other people/processes, and you should clean up after yourself.

In addition to the previous answer I'd like to also mention that this is more suitable when executing Integration tests. Since Integrated modules work together and in conjunction with infrastructure such as message queues and databases + each independent part works correctly with the services it depends on.
This
cleanup a DB after a test run
helps you to Isolate Test Data. A best practice here is to use transactions for database-dependent tests (e.g.,component tests) and roll back the transaction when done. Use a small subset of data to effectively test behavior. Consider it as Database Sandbox – using the Isolate Test Data pattern. E.g. each developer can use this lightweight DML to populate his local database sandboxes to expedite test execution.
Another advantage is that you Decouple your Database, so ensure that application is backward and forward compatible with your database so you can deploy each independently. Patterns like Encapsulate Table with View, and NoSQL databases ensure that you can deploy two application versions at once without either one of them throwing database-related errors. It was particularly successful in a project where it was imperative to access the database using stored procedures.
All this is actually one of the concepts that is used in Virtual test labs.

In addition to above answers, I'll add few more points:
DB shouldn't be cleaned after test because thats where you've your test data, test results and all history which can be referred later on.
DB should be cleaned only if you are changing some application setting to run your / any specific test, so that it shouldn't impact other tester.

Related

Is it a good practice to check the state of a database in acceptance tests?

Consider that one is supposed to write automated acceptance tests for e-commerce system. For example, you want to check that when customer completes checkout operation he has a new order registered in the system that is linked to his account. Now one thing, of course, that you can check is that there are some UI messages displayed like 'Order completed succesfully'. However that does not guarantee that the order is in fact persisted in the db. My question is whether it is OK to additionally verify with querying the DB that order indeed was saved. Or should I verify that inexplicitly in other acceptance test e.g. by checking list of orders (which should not be empty after completing checkout operation succesfully) beforehand performing checkout operation?
you can check is that there are some UI messages displayed like 'Order completed succesfully'. However that does not guarantee that the order is in fact persisted in the db.
Actually it depends. If we're talking Selenium - they do suggest such database-validation:
Another common type of testing is to compare data in the UI against the data actually stored in the AUT’s database. Since you can also do database queries from a programming language, assuming you have database support functions, you can use them to retrieve data and then use the data to verify what’s displayed by the AUT is correct.
However testing the Acceptance criteria should be done only after they are clear enough. If there is no such specific E2E requirement - you shouldn't include this DB check in these tests. You could put them in you functional and integration level, where the SUT architecture allows such black-box approach. If we concider the typical N-tier (UI-Backend-DB), your black-box will be the Middleware - input from UI, [skip all between], output is in DB.
This of course will introduce a bit more complexity and your test will become brittle (which is especially true for the UI ones). Further more you should think for expensive objects and keeping/disposing them properly (e.g. DB connection per suite run).
IMHO you should have all this covered in your auto tests:
it is OK to additionally verify with querying the DB that order indeed was saved. Or should I verify that inexplicitly in other acceptance test e.g. by checking list of orders (which should not be empty after completing checkout operation succesfully) beforehand performing checkout operation
And the only question would be where is the place to put them.

Testing an n-tier web application - should my test project have its own database?

In an n-tier web-app, should I be running integration tests against a different database, one dedicated to testing the code? Is it standard practice to test against the production database as well?
You should never run untested code on production. After all, you don't want to discover that it has a bug that wipes out all data. That's what tests are supposed to find. And you should not have test/staging data in the production system. It is good practice to dump the data out of production and load it into another environment for periodic testing with real-world data.
You should have a test database (not shared with production). It's a good idea to wipe out the data before every test.
You can have smoke tests that run in production. They will pretend to be a user(agent) and visit many pages, maybe even create things (with a special tag so you can find them again and delete them.)
I'd rather think of different database user with own data set. Database schema should be the same. I'd never run tests on production database with the same database user. Test logic shouldn't even be delivered to the client as it may lead to severe security issues.
In my opinion you'd need a full production-like data set for testing purposes, to be able to test every single feature of your application. And also you would need an empty database (without any bussiness data) for application clients to have it as initial point on delivery. Such a dataset shouldn't be tested as there is no data needed to test bussiness logic.

Should test data be used in production?

We are deploying an update to our main application in production. The update has been tested in QA and it looks good to go. Our client wants to do a test in production. For that case, we will run the application using "test data" in production and once the test has been finished, we will delete the "test data".
A couple of server admins are against this because "test data doesn't belong to production". I think it's OK since the QA server and the production server have different hardware and the databases house different applications (QA has more databases, production is dedicated). Besides that, are there other facts that I can use to back my opinion?
EDIT: adding context
The application is a tool that automates the reception and validation of data. We receive the files via email and this tool automatically validates them and imports them to the database. We have a BI system that creates reports using this information (excel files are received by email, then validate, then reports/views come out, all this automated).
The "test data" would be old files (good and bad files from previous efforts) that represent true data (actually it is true data but with problems or just too old).
Yes! But manual usage of test data in production does not sound like a good idea to me as it cannot be controlled or monitored. My answer below is assuming the test data is used for automated testing.
Test data in production is "todays" need. This was not a requirement back then when automated testing was not a requirement(or did not exist). So in general this will be frowned upon. Security is the main reason. Its impact in messing up site analytics is another reason. These are genuine and good reasons.
One cannot decide one day to simply put test data in production especially towards the end of project. This needs to be made a requirement from the time development starts. So the test data needs to be there in production from the very first deployment onwards. And its impact needs to studied and documented. Organization as a whole need to understand it's benefit and impact.
Test data needs to be divided based on it's type,need or context. eg: Retrievable test data and editable test data. First step would be to have Retrievable(read only-never changes) test data available. Perhaps this is farthest we could go in many case, still would provide good results. And creation of this read only test data needs to be automated and preferably documented.
The benefits of having test data in production is huge. An automated test of an application is more precious that then the application itself. If the management realizes that then at least the initial "frown" changes.I feel test data in production should be considered a requirement/userstory and all problems against it should be mitigated. And new patterns of development need to evolve in this area.
This discussion is also related to integration testing and this article focuses on the benefits of it over unit testing
Your admins are right. Having test data in production will expose you to the risks (security holes):
Test data in production can be used to do damage to your company (intentional or nonintentional).
For example if you have non excisting identities on production you can do payment to them. If they are linked to real bank accounts you lose money without the ability to detect it.
Test data can change your management reports. When having fake action, some can infuence reports and have impact on decisions made. This will very hard to track and even harder to correct.
Test data can interact with production data. If someone makes a mistake and make a wrong relation production data can be changed based on test data.
There is no good way of detecting you have test data, if you would mark it. All data can be marked as test data. If you handle the test data different in your businesslayer, it whould not be a real test of your production environent.
Nowadays it is a good practice have Staging environment with the same infrastructure configuration like Production, so you can execute pentests, load tests, and do whatever you want to do to ensure that Production will behave as you expect.

SQLite/Fluent NHibernate integration test harness initialization not repeatable after large data session

In one of my main data integration test harnesses I create and use Fluent NHibernate's SingleConnectionSessionSourceForSQLiteInMemoryTesting, to get a fresh session for each test. After each test, I close the connection, session, and session factory, and throw out the nested StructureMap container they came from. This works for almost any simple data integration test I can think of, including ones that utilize Fluent NHib's PersistenceSpecification object.
The trouble starts when I test the application's lengthy database bootstrapping process, which creates and saves thousands of domain objects. It's not that the first setup and tear-down of the test harness fails, in fact, the test harness successfully bootstraps the in-memory database as the application would bootstrap the real database in the production environment. The problem occurs when the database is bootstrapped a second time on a new in-memory database, with a new session and session factory.
The error is:
NHibernate.StaleStateException : Unexpected row count: 0; expected: 1
The row count is indeed Unexpected, the row that the application under test is looking for should be in the session. You see, it's not that any data from the last integration test is sticking around, it's that for some reason the session just stops working mid-database-boostrap. And I've looked everywhere for a place I might be holding on to an old session and I can't find one.
I've searched through the code for static singleton objects, but there are none anywhere near the code in question. I have a couple StructureMap InstanceScope singleton's but they are getting thrown out with each nested container that is lost after every test teardown.
I've tried every possible variation on disposing and closing every object involved with each test teardown and it still fails on this lengthy database bootstrap. I've even messed around with current_session_context_class to no avail. But non-bootstrap related database tests appear to work fine. I'm starting to run out of options and I may have to surrender lengthy database integration tests in favor of WatiN-based acceptance tests.
Can anyone give me any clue about how I can figure out why some of my SingleConnectionSessionSourceForSQLiteInMemoryTesting aren't repeatable?
Any advice at all, about how to make an NHibernate SqlLite database integration test harness repeatable for large data sessions?
Here is how we do it http://www.gears4.net/blog/archive/14/nhibernate-integration-testing
Hope it helps
I was able to solve this problem by not using an in memory database, and instead saving a hard copy file after initialization, once per test suite run. Then instead of reinitializing the database after each test, the file-based SqlLite database is copied and that new copy is used for testing.
Essentially, I only setup the initial database data once and save that database off to the side and it is copied for each test. There is a definite possibility that the problem could be on my end, but I suspect that there is an issue with large in-memory SqlLite databases. So I recommend using the file-mode of the database, if you are running into trouble with a large in-memory sqllite database.

Spawning multiple SQL tasks in SQL Server 2005

I have a number of stored procs which I would like to all run simultaneously on the server. Ideally all on the server without reliance on connections to an external client.
What options are there to launch all these and have them run simultaneously (I don't even need to wait until all the processes are done to do additional work)?
I have thought of:
Launching multiple connections from
a client, having each start the
appropriate SP.
Setting up jobs for
each SP and starting the jobs from a
SQL Server connection or SP.
Using
xp_cmdshell to start additional runs
equivalent to osql or whetever
SSIS - I need to see if the package can be dynamically written to handle more SPs, because I'm not sure how much access my clients are going to get to production
In the job and cmdshell cases, I'm probably going to run into permissions level problems from the DBA...
SSIS could be a good option - if I can table-drive the SP list.
This is a datawarehouse situation, and the work is largely independent and NOLOCK is universally used on the stars. The system is an 8-way 32GB machine, so I'm going to load it down and scale it back if I see problems.
I basically have three layers, Layer 1 has a small number of processes and depends on basically all the facts/dimensions already being loaded (effective, the stars are a Layer 0 - and yes, unfortunately they will all need to be loaded), Layer 2 has a number of processes which depend on some or all of Layer 1, and Layer 3 has a number of processes which depend on some or all of Layer 2. I have the dependencies in a table already, and would only initially launch all the procs in a particular layer at the same time, since they are orthogonal within a layer.
Is SSIS an option for you? You can create a simple package with parallel Execute SQL tasks to execute the stored procs simultaneously. However, depending on what your stored procs do, you may or may not get benefit from starting this in parallel (e.g. if they all access the same table records, one may have to wait for locks to be released etc.)
At one point I did some architectural work on a product known as Acumen Advantage that has a warehouse manager that does this.
The basic strategy for this is to have a control DB with a list of the sprocs and their dependencies. Based on the dependencies you can do a Topological Sort to give them an order to run in. If you do this, you need to manage the dependencies - all of the predecessors of a stored procedure must complete before it executes. Just starting the sprocs in order on multiple threads will not accomplish this by itself.
Implementing this meant knocking much of the SSIS functionality on the head and implementing another scheduler. This is OK for a product but probably overkill for a bespoke system. A simpler solution is thus:
You can manage the dependencies at a more coarse-grained level by organising the ETL vertically by dimension (sometimes known as Subject Oriented ETL) where a single SSIS package and set of sprocs takes the data from extraction through to producing dimensions or fact tables. Typically the dimensions will mostly be siloed, so they will have minimal interdependency. Where there is interdependency, make one dimension (or fact table) load process dependent on whatever it needs upstream.
Each loader becomes relatively modular and you still get a useful degree of parallelism by kicking off the load processes in parallel and letting the SSIS scheduler work it out. The dependencies will contain some redundancy. For example an ODS table may not be dependent on a dimension load being completed but the upstream package itself takes the components right through to the dimensional schema before it completes. However this is not likely to be an issue in practice for the following reasons:
The load process probably has plenty of other tasks that can execute in the meantime
The most resource-hungry tasks will almost certainly be the fact table loads, which will mostly not be dependent on each other. Where there is a dependency (e.g. a rollup table based on the contents of another table) this cannot be avoided anyway.
You can construct the SSIS packages so they pick up all of their configuration from an XML file and the location can be supplied exernally in an environment variable. This sort of thing can be fairly easily implemented with scheduling systems like Control-M.
This means that a modified SSIS package can be deployed with relatively little manual intervention. The production staff can be handed the packages to deploy along with the stored procedures and can mainain the config files on a per-environment basis without having to manually fiddle configuration in the SSIS packages.
you might want to look at the service broker and it's activation stored procedures... might be an option...
In the end, I created a C# management console program which launches the processes Async as they are able to be run and keeps track of the connections.