We are deploying an update to our main application in production. The update has been tested in QA and it looks good to go. Our client wants to do a test in production. For that case, we will run the application using "test data" in production and once the test has been finished, we will delete the "test data".
A couple of server admins are against this because "test data doesn't belong to production". I think it's OK since the QA server and the production server have different hardware and the databases house different applications (QA has more databases, production is dedicated). Besides that, are there other facts that I can use to back my opinion?
EDIT: adding context
The application is a tool that automates the reception and validation of data. We receive the files via email and this tool automatically validates them and imports them to the database. We have a BI system that creates reports using this information (excel files are received by email, then validate, then reports/views come out, all this automated).
The "test data" would be old files (good and bad files from previous efforts) that represent true data (actually it is true data but with problems or just too old).
Yes! But manual usage of test data in production does not sound like a good idea to me as it cannot be controlled or monitored. My answer below is assuming the test data is used for automated testing.
Test data in production is "todays" need. This was not a requirement back then when automated testing was not a requirement(or did not exist). So in general this will be frowned upon. Security is the main reason. Its impact in messing up site analytics is another reason. These are genuine and good reasons.
One cannot decide one day to simply put test data in production especially towards the end of project. This needs to be made a requirement from the time development starts. So the test data needs to be there in production from the very first deployment onwards. And its impact needs to studied and documented. Organization as a whole need to understand it's benefit and impact.
Test data needs to be divided based on it's type,need or context. eg: Retrievable test data and editable test data. First step would be to have Retrievable(read only-never changes) test data available. Perhaps this is farthest we could go in many case, still would provide good results. And creation of this read only test data needs to be automated and preferably documented.
The benefits of having test data in production is huge. An automated test of an application is more precious that then the application itself. If the management realizes that then at least the initial "frown" changes.I feel test data in production should be considered a requirement/userstory and all problems against it should be mitigated. And new patterns of development need to evolve in this area.
This discussion is also related to integration testing and this article focuses on the benefits of it over unit testing
Your admins are right. Having test data in production will expose you to the risks (security holes):
Test data in production can be used to do damage to your company (intentional or nonintentional).
For example if you have non excisting identities on production you can do payment to them. If they are linked to real bank accounts you lose money without the ability to detect it.
Test data can change your management reports. When having fake action, some can infuence reports and have impact on decisions made. This will very hard to track and even harder to correct.
Test data can interact with production data. If someone makes a mistake and make a wrong relation production data can be changed based on test data.
There is no good way of detecting you have test data, if you would mark it. All data can be marked as test data. If you handle the test data different in your businesslayer, it whould not be a real test of your production environent.
Nowadays it is a good practice have Staging environment with the same infrastructure configuration like Production, so you can execute pentests, load tests, and do whatever you want to do to ensure that Production will behave as you expect.
Related
I have several test suites that read and write data from a dedicated database when they are run. My strategy is to assume that the DB is in an unreliable state before a test is run and if I need certain records in certain tables or an empty table I do that setup before the test is run.
My attitude is to not cleanup the DB at the end of each test suite because each test suite should do a cleanup and setup before it runs. Also, if I'm trying to "visually" debug a test suite it helps that the final state of the DB persists after the tests have completed.
Is there a compelling reason to cleanup a DB after your tests have run?
Depends on your tests, what happens after your tests, and how many people are doing testing.
If you're just testing locally, then no, cleaning up after yourself isn't as important ~so long as~ you're consistently employing this philosophy AND you have a process in place to make sure the database is in a known-good state before doing something other than testing.
If you're part of a team, then yes, leaving your test junk behind can screw up other people/processes, and you should clean up after yourself.
In addition to the previous answer I'd like to also mention that this is more suitable when executing Integration tests. Since Integrated modules work together and in conjunction with infrastructure such as message queues and databases + each independent part works correctly with the services it depends on.
This
cleanup a DB after a test run
helps you to Isolate Test Data. A best practice here is to use transactions for database-dependent tests (e.g.,component tests) and roll back the transaction when done. Use a small subset of data to effectively test behavior. Consider it as Database Sandbox – using the Isolate Test Data pattern. E.g. each developer can use this lightweight DML to populate his local database sandboxes to expedite test execution.
Another advantage is that you Decouple your Database, so ensure that application is backward and forward compatible with your database so you can deploy each independently. Patterns like Encapsulate Table with View, and NoSQL databases ensure that you can deploy two application versions at once without either one of them throwing database-related errors. It was particularly successful in a project where it was imperative to access the database using stored procedures.
All this is actually one of the concepts that is used in Virtual test labs.
In addition to above answers, I'll add few more points:
DB shouldn't be cleaned after test because thats where you've your test data, test results and all history which can be referred later on.
DB should be cleaned only if you are changing some application setting to run your / any specific test, so that it shouldn't impact other tester.
In a common continuous-delivery process, the code is moving from a development instance to a staging instance to production instance.
For development purpose (reproducing bugs, testing performance with a full data set), most of the time developers fetch data from production database to their development environment. See, for example, this question.
In my company, we use three instances beside production in our continuous delivery process:
latest: sync every night with our SCM trunk
staging: with the last released version before deployment to production
stable: with the exact same version of the software deployed in production (useful to reproduce bugs found on production)
The problem is that on the stable instance, for reproducing bugs we would like to have the exact same data set that is on production. So we would like to sync databases on a nightly basis.
Is it a good practice ? How to implement it ? Any pitfalls ?
Depending on the data you have in production, you may not want to replicate it back to non-production environments. (Or may not even be allowed to under certain regulations.) If you have customer data, personally identifiable information (PII), regulated data, financial data, credit card data, health data, SSN, or any other type of sensitive data, if you replicate it you need the full controls you have (or should have) in production - which you probably don't, and probably don't want.
There are several VDB solutions which I recommend you to look for.
One of them is Delphix
Windocks supports containers with integrated database cloning, and is used for just the use case described. Full disclosure, I work for Windocks.
I have a desktop app that clients are using at the moment and each client has access to their own local network database.
My manager has decided that its best to merge these databases and only have one. All clients would then access that one database through a webservice that sits on the cloud. I would like to weight the pros and cons before we go ahead with this decision.
The one option we have is to have a ClientID in each of the tables which will result in each table having a composite key .
I have heard that another option would be to use schemas .Please advise how the schema way would work and is this the best way in comparison to having a composite key in each table.
Thank you.
This is a seriously difficult and time consuming task. You will need to have extensive regression tests already built because the risk of things breaking is huge.
Let me tell you a story of a client that had a separate database on a separate suerver that got merged with another database that contained many clients. It took several months to make all the changes to convert the data. Everything looked good and it was pushed to prod. Unfortunately the developer missed one place where client id needed to be referenced (It usually wasn't in the old code since they were the only client on the server). The first day in production a process that sent out emails, sent client proprietary data not only to the client sales reps but to the sales reps of many of their competitors. Of all the places that the change could have been missed, this was the worst possible one. It not only harmed our relationship with the first client but with all the clients that got some other client's info by mistake.
There is also the problem of migrating the data, the project for that alone (without the code changes the application will need) will take months and then you have consider that the clients will be adding data as you go and the final push may run into unexpected hiccups due to new data. You may also have to turn off the odl system for at least a weekend to do the production change.
Using schemas won't make it any easier as you will then have to adjust the code to hit the correct schema per client. And when you change somethign you wil have to change it for each individual schema, so it tends to make the database much more difficult to maintain.
While I am a great fan of having multiple clients in one database, when you didn't start out that way, it is extremely risky and expensive to change. I would not do it al all unless I had these things:
Code in source control
Extensive Unit and regression tests
Separate dev, QA and prod environments
A process for client UAT testing
Extensive knowledge of how cloud computing and webservices works (everyone I know who has moved stuff to the cloud has had some real gotchas)
A QA department
Six months to one year time frame for the project
At least one senior data analyst on the team.
In an n-tier web-app, should I be running integration tests against a different database, one dedicated to testing the code? Is it standard practice to test against the production database as well?
You should never run untested code on production. After all, you don't want to discover that it has a bug that wipes out all data. That's what tests are supposed to find. And you should not have test/staging data in the production system. It is good practice to dump the data out of production and load it into another environment for periodic testing with real-world data.
You should have a test database (not shared with production). It's a good idea to wipe out the data before every test.
You can have smoke tests that run in production. They will pretend to be a user(agent) and visit many pages, maybe even create things (with a special tag so you can find them again and delete them.)
I'd rather think of different database user with own data set. Database schema should be the same. I'd never run tests on production database with the same database user. Test logic shouldn't even be delivered to the client as it may lead to severe security issues.
In my opinion you'd need a full production-like data set for testing purposes, to be able to test every single feature of your application. And also you would need an empty database (without any bussiness data) for application clients to have it as initial point on delivery. Such a dataset shouldn't be tested as there is no data needed to test bussiness logic.
AFAIK staging deployments are intended for testing Azure roles which implies that I could deploy a role with errors in code into staging. If that error damages my data I could be screwed.
How do I address that? I can't stage a role without reasonable data (hard to test it) and I can't let an unstable role damage the data.
Do I have to maintain a separate dataset for staging? How is this problem typically solved?
AFAIK staging deployments are intended for testing Azure roles which implies that I could deploy a role with errors in code into staging. If that error damages my data I could be screwed.
Staging is really designed to be a place for deployment - for spinning up new role instances prior to the instant virtual IP address swap. While you can do some testing there - e.g. making some final checks that your deployment is valid - it's not really there to allow you to do lots of testing.
How do I address that? I can't stage a role without reasonable data (hard to test it) and I can't let an unstable role damage the data.
I've generally tested on a development environment with fake data or deployed as a separate Azure service with fake data. However, I admit this has never been in the situation where I've needed huge amounts of data for testing - generally these tests have been test deployments with just 1 or 2 users.
Staging, as an environmentis meant to acurately simulate your production environment, including the data.
We have the following strategy: production is production, staging is connected to the same DB as staging, because the updates in Azure work the way they do; meaning I want to be able to upgrade my staging deployment, give the client a chance to verify again, and then swap the VIPs for the deployments, thus transitioning the application seamlessly. For those times, when there are breaking changes in the database, we decided to either create a new deployment alltogether, or turn-off the production one, giving users a maintenance notice.
Ultimately it's whatever you decide. But again, bearing in mind what Azure's staging is, I'd suggest keeping the data real, and consider it a beta access "program". Unless of course you have other requirements. But that's besides the point.