Liquibase incremental snapshots - liquibase

We've got a rather interesting use-case where we're using Liquibase to deploy a database for our application but we're not actually in control of the database. This means that we've got to add in a lot of extra logic around each time we run Liquibase to avoid encountering any errors during the actual run. One way we've done that is that we're generating snapshots of what the DB should look like for each release of our product and then comparing that snapshot with the running DB to know that it's in a compatible state. The snapshot files for our complete database aren't gigantic but if we have to have a full one for every possible release that could cause our software package to get large in the future with dead weight.
We've looked at using the Linux patch command to create offset files as the deltas between these files will typically be very small (i.e. 1 column change, etc.) but the issues are the generated IDs in the snapshot that are not consistent across runs:
"snapshotId": "aefa109",
"table": "liquibase.structure.core.Table#aefa103"
Is there any way to force the IDs to be consistent or attack this problem in a different way?
Thanks!

Perhaps we should change how we think about PROD deployments. When I read:
This means that we've got to add in a lot of extra logic around each time we run Liquibase to avoid encountering any errors during the actual run.
This is sort of an anti-pattern in the world of Liquibase. Typically, Liquibase is used in a CI/CD pipeline and deployments of SQL are done on "lower environments" to practice for the PROD deployment (which many do not have control over, so your situation is a common one).
When we try to accommodate the possible errors during a PROD deployment, I feel we already are in a bad place with our deployment automation. We should have been testing the deploys on lower environmets that look like PROD.
For example your pipeline for your DB could look like:
DEV->QA->PROD
Create SQL for deployment in a changelog
DEV & QA seeded with restore from current state of PROD (maybe minus the row data)
You would have all control in DEV (the wild west)
Less control of QA (typically only by QA)
Iterate till you have no errors in your DEV & QA env
Deploy to PROD
If you still have errors, I would argue that you must root cause why and resolve so you can have a pipeline that is automatable.
Hope that helps,
Ronak

Related

Why cleanup a DB after a test run?

I have several test suites that read and write data from a dedicated database when they are run. My strategy is to assume that the DB is in an unreliable state before a test is run and if I need certain records in certain tables or an empty table I do that setup before the test is run.
My attitude is to not cleanup the DB at the end of each test suite because each test suite should do a cleanup and setup before it runs. Also, if I'm trying to "visually" debug a test suite it helps that the final state of the DB persists after the tests have completed.
Is there a compelling reason to cleanup a DB after your tests have run?
Depends on your tests, what happens after your tests, and how many people are doing testing.
If you're just testing locally, then no, cleaning up after yourself isn't as important ~so long as~ you're consistently employing this philosophy AND you have a process in place to make sure the database is in a known-good state before doing something other than testing.
If you're part of a team, then yes, leaving your test junk behind can screw up other people/processes, and you should clean up after yourself.
In addition to the previous answer I'd like to also mention that this is more suitable when executing Integration tests. Since Integrated modules work together and in conjunction with infrastructure such as message queues and databases + each independent part works correctly with the services it depends on.
This
cleanup a DB after a test run
helps you to Isolate Test Data. A best practice here is to use transactions for database-dependent tests (e.g.,component tests) and roll back the transaction when done. Use a small subset of data to effectively test behavior. Consider it as Database Sandbox – using the Isolate Test Data pattern. E.g. each developer can use this lightweight DML to populate his local database sandboxes to expedite test execution.
Another advantage is that you Decouple your Database, so ensure that application is backward and forward compatible with your database so you can deploy each independently. Patterns like Encapsulate Table with View, and NoSQL databases ensure that you can deploy two application versions at once without either one of them throwing database-related errors. It was particularly successful in a project where it was imperative to access the database using stored procedures.
All this is actually one of the concepts that is used in Virtual test labs.
In addition to above answers, I'll add few more points:
DB shouldn't be cleaned after test because thats where you've your test data, test results and all history which can be referred later on.
DB should be cleaned only if you are changing some application setting to run your / any specific test, so that it shouldn't impact other tester.

In a continuous delivery process, is there a proper way to automatically move data from production to development?

In a common continuous-delivery process, the code is moving from a development instance to a staging instance to production instance.
For development purpose (reproducing bugs, testing performance with a full data set), most of the time developers fetch data from production database to their development environment. See, for example, this question.
In my company, we use three instances beside production in our continuous delivery process:
latest: sync every night with our SCM trunk
staging: with the last released version before deployment to production
stable: with the exact same version of the software deployed in production (useful to reproduce bugs found on production)
The problem is that on the stable instance, for reproducing bugs we would like to have the exact same data set that is on production. So we would like to sync databases on a nightly basis.
Is it a good practice ? How to implement it ? Any pitfalls ?
Depending on the data you have in production, you may not want to replicate it back to non-production environments. (Or may not even be allowed to under certain regulations.) If you have customer data, personally identifiable information (PII), regulated data, financial data, credit card data, health data, SSN, or any other type of sensitive data, if you replicate it you need the full controls you have (or should have) in production - which you probably don't, and probably don't want.
There are several VDB solutions which I recommend you to look for.
One of them is Delphix
Windocks supports containers with integrated database cloning, and is used for just the use case described. Full disclosure, I work for Windocks.

How to continuously delivery SQL-based app?

I'm looking to apply continuous delivery concepts to web app we are building, and wondering if there any solution to protecting the database from accidental erroneous commit. For example, a bug that erases whole table instead of a single record.
How this issue impact can be limited according to continuous delivery doctorine, where the application deployed gradually over segments of infrastructure?
Any ideas?
Well first you cannot tell just from looking what is a bad SQL statement. You might have wanted to delete the entire contents of the table. Therefore is is not physiucally possible to have an automated tool that detects intent.
So to protect your database, first make sure you are in full recovery (not simple) mode and have full backups nightly and transaction log backups every 15 minutes or so. Now you cannot lose much information no matter how badly the process breaks. Your dbas should be trained to be able to recover to a point in time. If you don't have any dbas, I'd suggest the best thing you can do to protect your data is hire some. This is a non-negotiable in any non-trivial database environment and it is terribly risky not to have trained, experienced dbas if your data is critical to the business.
Next, you need to treat SQL like any other code, it should be in source control in scripts. If you are terribly concerned about accidental deletions, then write the scripts for deletes to copy all deletes to a staging table and delete the content of the staging table once a week or so. Enforce this convention in the code reviews. Or better yet set up an auditing process that runs through triggers. Once all records are audited, it is much easier to get back the 150 accidental deletions without having to restore a database. I would never consider having any enterprise application without auditing.
All SQL scripts without exception should be code-reviewed just like other code. All SQL scripts should be tested on QA and passed before moving to porduction. This will greatly reduce the possiblility for error. No developer should have write rights to production, only dbas should have that. Therefore each script should be written so that is can just be run, not run one chunk at a time where you could accidentally forget to highlight the where clause. Train your developers to use transactions correctly in the scripts as well.
Your concern is bad data happening to the database. The solution is to use full logging of all transactions so you can back out of transactions that you want to. This would usually be used in a context of full backups/incremental backups/full logging.
SQL Server, for instance, allows you to restore to a point in time (http://msdn.microsoft.com/en-us/library/ms190982(v=sql.105).aspx), assuming you have full logging.
If you are creating and dropping tables, this could be an expensive solution, in terms of space needed for the log. However, it might meet your needs for development.
You may find that full-logging is too expensive for such an application. In that case, you might want to make periodic backups (daily? hourly?) and just keep these around. For this purpose, I've found LightSpeed to be a good product for fast and efficient backups.
One of the strategies that is commonly adopted is to log the incremental sql statements rather than a collective schema generation so you can control the change at a much granular levels:
ex:
change 1:
UP:
Add column
DOWN:
Remove column
change 2:
UP:
Add trigger
DOWN:
Remove trigger
Once the changes are incrementally captured like this, you can have a simple but efficient script to upgrade (UP) from any version to any version without having to worry about the changes that happening. When the change # are linked to build, it becomes even more effective. When you deploy a build the database is also automatically upgraded(UP) or downgraded(DOWN) to that specific build.
We have an pipeline app which does that at CloudMunch.

Redis databases on a dev machine with multiple projects

How do you manage multiple projects on your development and/or testing machine, when some of those projects use Redis databases?
There are 2 major problems:
Redis doesn't have named databases (only numbers 0-16)
Tests are likely to execute FLUSHDB on each run
Right now, I think we have three options:
Assign different databases for each project, each dev and test environment
Prefix keys with a project name using something like redis-namespace
Nuke and seed the databases anytime you switch between projects
The first one is problematic if multiple projects assign "0" for the main use and "1" for the test and such. Even if Project B decided to change to "2" and "3", another member in the project might have a conflict in another projects for him. In other words, that approach is not SCM friendly.
For the second one, it's a bad idea simply because it adds needless overhead on runtime performance and memory efficiency. And no matter what you do, another project might be already using the same key coincidentally when you joined the project.
The third option is rather a product of compromise, but sometimes I want to keep my local data untouched while I deploy small patches for another projects.
I know this could be a feature request for Redis, but I need a solution now.
Any ideas, practices?
If the projects are independent and so do not need to share data, it is much better to use multiple redis instances - each project configuration has a port number rather than a database name/id. Create an appropriately named config file and startup script for each one so that you can get whichever instance you need running with a single click.
Make sure you update the save settings in each config file as well as setting the ports - Multiple instances using the same dump.rdb file will work, but lead to some rather confusing bugs.
I also use separate instances for development and testing so that the test instance never writes anything to disk and can be flushed at the start of each test.
Redis is moving away from multiple databases, so I would recommend you start migrating put of that mechanism sooner rather than later. This means one instance per db. Given the very low overhead of running Redis, this isn't a problem from a resources standpoint.
That said, you can specify the number of databases, and providing A naming standard would work. For example, configure redis to have say, 60 DBS and you add 10 for the test db. For example db3 uses db13 for testing.
It sounds like your dev, test, and prod environments are pretty tied together. If so, I'd suggest moving away from that. Using separate instances is the easiest route to that, and provides protection against cross purpose contamination. Between this and the future of redis being single-db per instance, separate instances is the best route.

Recommended approach how to modify schema of a production SQL database?

Say there is a database with 100+ tables and a major feature is added, which requires 20 of existing tables to be modified and 30 more added. The changes were done over a long time (6 months) by multiple developers on the development database. Let's assume the changes do not make any existing production data invalid (e.g. there are default values/nulls allowed on added columns, there are no new relations or constraints that could not be fulfilled).
What is the easiest way to publish these changes in schema to the production database? Preferably, without shutting the database down for an extended amount of time.
Write a T-SQL script that performs the needed changes. Test it on a copy of your production database (restore from a recent backup to get the copy). Fix the inevitable mistakes that the test will discover. Repeat until script works perfectly.
Then, when it's time for the actual migration: lock the DB so only admins can log in. Take a backup. Run the script. Verify results. Put DB back online.
The longest part will be the backup, but you'd be crazy not to do it. You should know how long backups take, the overall process won't take much longer than that, so that's how long your downtime will need to be. The middle of the night works well for most businesses.
There is no generic answer on how to make 'changes' without downtime. The answer really depends from case to case, based on exactly what are the changes. Some changes have no impact on down time (eg. adding new tables), some changes have minimal impact (eg. adding columns to existing tables with no data size change, like a new nullable column that doe snot increase the null bitmap size) and other changes will wreck havoc on down time (any operation that will change data size will force and index rebuild and lock the table for the duration). Some changes are impossible to apply without *significant * downtime. I know of cases when the changes were applies in parallel: a copy of the database is created, replication is set up to keep it current, then the copy is changed and kept in sync, finally operations are moved to the modified copy that becomes the master database. There is a presentation at PASS 2009 given by Michelle Ufford that mentions how godaddy gone through such a change that lasted weeks.
But, at a lesser scale, you must apply the changes through a well tested script, and measure the impact on the test evaluation.
But the real question is: is this going to be the last changes you ever make to the schema? Finally, you have discovered the perfect schema for the application and the production database will never change? Congratulation, once you pull this off, you can go to rest. But realistically, you will face the very same problem in 6 months. the real problem is your development process, with developers and making changes from SSMS or from VS Server Explored straight into the database. Your development process must make a conscious effort to adopt a schema change strategy based on versioning and T-SQL scripts, like the one described in Version Control and your Database.
Use a tool to create a diff script and run it during a maintenance window. I use RedGate SQL Compare for this and have been very happy with it.
I've been using dbdeploy successfully for a couple of years now. It allows your devs to create small sql change deltas which can then be applied against your database. The changes are tracked by a changelog table within database so that it knows what to apply.
http://dbdeploy.com/