Does liquibase have a method to test a rollback strategy without actually executing it? - liquibase

I am brand new to Liquibase...
Today I wrote a Liquibase changeset using --liquibase formatted sql.
I created two tables where the second had a foreign key dependency on the first.
My rollback strategy was (mistakenly) drop table1; drop table2. When I ran the update and tested the rollback it failed because of the foreign key constraint. However, when I corrected my mistake and attempted to rerun it, it failed because the checksum didn't match.
I know that the obvious answer is to make more atomic changesets, however...
Does Liquibase support a way to test this sort of thing without actually running it so I can avoid the problem with the checksum on the edited rollback?
Failing that: is there a workaround for the checksum problem that will let me edit my files after running the update? (ctrl+z?)

The short answer to your question is, no Liquibase doesn't have such a thing.
Liquibase is a great toolkit, but it doesn't have a lot of bells and whistles, and it doesn't have much of an 'opinion' on how it should be used, or what your workflow should be. In your case, I would suggest that one way to deal with the problem is to just drop the database and then re-create it from the changelog. If you have already deployed the changelog in multiple places, that might not be possible, and if you weren't prepared to do that, that could be a problem.
There is an option to specify a validChecksums attribute on a changeset, so you could use that, but in general if you are using that you are making your changelog more complicated.
If you wanted to look at something that is more full-featured and that has the ability to forecast changes before they are deployed, please check out my company's product, Datical DB. It uses liquibase at its core, but adds a whole lot more (and is priced accordingly).

Liquibase provides updateTestingRollback command that basically updates the database, then tests its rollback and if successful, applies the changes again.
Your problem with invalid checksums may be solved with clearCheckSums command. It removes current checksums from database and on next update change sets that have already been deployed will have their checksums recomputed, and change sets that have not been deployed will be deployed.
For more details check Liquibase commands.

Related

How to track data changes when deploying a SQL SSDT project

We upgrade out databases using SSDT DACPAC deployments. Developers work on the projects in VisualStudio 2015, modifying the schema where needed.
Developers also add Pre and Post deployment scripts to the projects. Some of these scripts ensure that certain tables always have the expected data in them. Others add, move, or mutate data as part of the platform upgrade.
We need to improve the output generated during database deployments so that, after a deployment, we have a human readable list of any data that changed as part of the deployment.
I'm currently considering two approaches. But neither seems ideal. They are:
1) Manually add logging to all Pre and Post scripts in the project. This is certainly an option. But it is not ideal because it complicates the upgrade scripts and is also likely to sometimes be missed or incorrectly done by developers. Since the goal is to detect unexpected data changes that occurred as part of the deployment, this uncertainty is a real bummer. A generic solution is preferable.
2) Here's my best stab at a generic solution: As part of the deployment process, enable SQL Change Data Capture for all user tables in the database. Then, at the end of the deployment, collect all captured changes and disable CDC. I've actually got this working. But the process of enabling CDC on all tables in the database takes a few minutes (one of our databases has 775 tables, it takes about 3 minutes to enable CDC on all of them). This approach also just feels very... heavy?
My question is. Is there a better way to achieve my goal of reliably generating a report of data that was changed as part of the database deployment, given that the deployment runs arbitrary Pre and Post deployment scripts?
If there does not seem to be a better way, I would appreciate feedback on option #2. Am I crazy for considering this?
Some of these scripts ensure that certain tables always have the expected data in them
First of all PreDeploy/PostDeploy scripts are not validated. I prefer to use them as launching point only, and do the actual work inside stored procedures.
So instead of writing:
INSERT INTO dbo.tab1(id, col1, col2) VALUES (...,..., ...);
and inserting it on PostDeployscript, you could put:
EXEC dbo.Populate_Tab1;
and define stored procedure as:
CREATE PROCEDURE dbo.Populate_Tab1
AS
BEGIN
-- idempotent script, here by using MERGE
WITH src(id, col1, col2,...) AS (
SELECT 1, ..., ... UNION ALL
SELECT ...
)
MERGE dbo.tab1 trg
USING src
ON trg.id = src.id
WHEN MATCHED THEN UPDATE
...
WHEN NOT MATCHED BY SOURCE THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT
...
END
Key point: stored procedure must be idempotent.
This way you are always sure that the table contain desired input data, and stored procedure is validated.
Similar approach is described by Kamil Nowinski:
Script and deploy the data for database from SSDT project
Advantages:
The stored procedures are part of the database project.
They will be validated and compiled, thus you avoid potential errors with uncontrolled code
The changes to the SP appears in output script only when something changes.
It’s easier to review the script before a run as it doesn’t contain unnecessary code
In Post-deployment script – there is only one line of code which never is changed.

SQL (or any relational db) engine with SCM-friendly backing store [duplicate]

I'm doing a web app, and I need to make a branch for some major changes, the thing is, these changes require changes to the database schema, so I'd like to put the entire database under git as well.
How do I do that? is there a specific folder that I can keep under a git repository? How do I know which one? How can I be sure that I'm putting the right folder?
I need to be sure, because these changes are not backward compatible; I can't afford to screw up.
The database in my case is PostgreSQL
Edit:
Someone suggested taking backups and putting the backup file under version control instead of the database. To be honest, I find that really hard to swallow.
There has to be a better way.
Update:
OK, so there' no better way, but I'm still not quite convinced, so I will change the question a bit:
I'd like to put the entire database under version control, what database engine can I use so that I can put the actual database under version control instead of its dump?
Would sqlite be git-friendly?
Since this is only the development environment, I can choose whatever database I want.
Edit2:
What I really want is not to track my development history, but to be able to switch from my "new radical changes" branch to the "current stable branch" and be able for instance to fix some bugs/issues, etc, with the current stable branch. Such that when I switch branches, the database auto-magically becomes compatible with the branch I'm currently on.
I don't really care much about the actual data.
Take a database dump, and version control that instead. This way it is a flat text file.
Personally I suggest that you keep both a data dump, and a schema dump. This way using diff it becomes fairly easy to see what changed in the schema from revision to revision.
If you are making big changes, you should have a secondary database that you make the new schema changes to and not touch the old one since as you said you are making a branch.
I'm starting to think of a really simple solution, don't know why I didn't think of it before!!
Duplicate the database, (both the schema and the data).
In the branch for the new-major-changes, simply change the project configuration to use the new duplicate database.
This way I can switch branches without worrying about database schema changes.
EDIT:
By duplicate, I mean create another database with a different name (like my_db_2); not doing a dump or anything like that.
Use something like LiquiBase this lets you keep revision control of your Liquibase files. you can tag changes for production only, and have lb keep your DB up to date for either production or development, (or whatever scheme you want).
Irmin (branching + time travel)
Flur.ee (immutable + time travel + graph query)
XTDB (formerly called 'CruxDB') (time travel + query)
TerminusDB (immutable + branching + time travel + Graph Query!)
DoltDB (branching + time-travel + SQL query)
Quadrable (branching + remote state verification)
EdgeDB (no real time travel, but migrations derived by the compiler after schema changes)
Migra (diffing for Postgres schemas/data. Auto-generate migration scripts, auto-sync db state)
ImmuDB (immutable + time-travel)
I've come across this question, as I've got a similar problem, where something approximating a DB based Directory structure, stores 'files', and I need git to manage it. It's distributed, across a cloud, using replication, hence it's access point will be via MySQL.
The gist of the above answers, seem to similarly suggest an alternative solution to the problem asked, which kind of misses the point, of using Git to manage something in a Database, so I'll attempt to answer that question.
Git is a system, which in essence stores a database of deltas (differences), which can be reassembled, in order, to reproduce a context. The normal usage of git assumes that context is a filesystem, and those deltas are diff's in that file system, but really all git is, is a hierarchical database of deltas (hierarchical, because in most cases each delta is a commit with at least 1 parents, arranged in a tree).
As long as you can generate a delta, in theory, git can store it. The problem is normally git expects the context, on which it's generating delta's to be a file system, and similarly, when you checkout a point in the git hierarchy, it expects to generate a filesystem.
If you want to manage change, in a database, you have 2 discrete problems, and I would address them separately (if I were you). The first is schema, the second is data (although in your question, you state data isn't something you're concerned about). A problem I had in the past, was a Dev and Prod database, where Dev could take incremental changes to the schema, and those changes had to be documented in CVS, and propogated to live, along with additions to one of several 'static' tables. We did that by having a 3rd database, called Cruise, which contained only the static data. At any point the schema from Dev and Cruise could be compared, and we had a script to take the diff of those 2 files and produce an SQL file containing ALTER statements, to apply it. Similarly any new data, could be distilled to an SQL file containing INSERT commands. As long as fields and tables are only added, and never deleted, the process could automate generating the SQL statements to apply the delta.
The mechanism by which git generates deltas is diff and the mechanism by which it combines 1 or more deltas with a file, is called merge. If you can come up with a method for diffing and merging from a different context, git should work, but as has been discussed you may prefer a tool that does that for you. My first thought towards solving that is this https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#External-Merge-and-Diff-Tools which details how to replace git's internal diff and merge tool. I'll update this answer, as I come up with a better solution to the problem, but in my case I expect to only have to manage data changes, in-so-far-as a DB based filestore may change, so my solution may not be exactly what you need.
There is a great project called Migrations under Doctrine that built just for this purpose.
Its still in alpha state and built for php.
http://docs.doctrine-project.org/projects/doctrine-migrations/en/latest/index.html
Take a look at RedGate SQL Source Control.
http://www.red-gate.com/products/sql-development/sql-source-control/
This tool is a SQL Server Management Studio snap-in which will allow you to place your database under Source Control with Git.
It's a bit pricey at $495 per user, but there is a 28 day free trial available.
NOTE
I am not affiliated with RedGate in any way whatsoever.
I've released a tool for sqlite that does what you're asking for. It uses a custom diff driver leveraging the sqlite projects tool 'sqldiff', UUIDs as primary keys, and leaves off the sqlite rowid. It is still in alpha so feedback is appreciated.
Postgres and mysql are trickier, as the binary data is kept in multiple files and may not even be valid if you were able to snapshot it.
https://github.com/cannadayr/git-sqlite
I want to make something similar, add my database changes to my version control system.
I am going to follow the ideas in this post from Vladimir Khorikov "Database versioning best practices". In summary i will
store both its schema and the reference data in a source control system.
for every modification we will create a separate SQL script with the changes
In case it helps!
You can't do it without atomicity, and you can't get atomicity without either using pg_dump or a snapshotting filesystem.
My postgres instance is on zfs, which I snapshot occasionally. It's approximately instant and consistent.
I think X-Istence is on the right track, but there are a few more improvements you can make to this strategy. First, use:
$pg_dump --schema ...
to dump the tables, sequences, etc and place this file under version control. You'll use this to separate the compatibility changes between your branches.
Next, perform a data dump for the set of tables that contain configuration required for your application to operate (should probably skip user data, etc), like form defaults and other data non-user modifiable data. You can do this selectively by using:
$pg_dump --table=.. <or> --exclude-table=..
This is a good idea because the repo can get really clunky when your database gets to 100Mb+ when doing a full data dump. A better idea is to back up a more minimal set of data that you require to test your app. If your default data is very large though, this may still cause problems though.
If you absolutely need to place full backups in the repo, consider doing it in a branch outside of your source tree. An external backup system with some reference to the matching svn rev is likely best for this though.
Also, I suggest using text format dumps over binary for revision purposes (for the schema at least) since these are easier to diff. You can always compress these to save space prior to checking in.
Finally, have a look at the postgres backup documentation if you haven't already. The way you're commenting on backing up 'the database' rather than a dump makes me wonder if you're thinking of file system based backups (see section 23.2 for caveats).
What you want, in spirit, is perhaps something like Post Facto, which stores versions of a database in a database. Check this presentation.
The project apparently never really went anywhere, so it probably won't help you immediately, but it's an interesting concept. I fear that doing this properly would be very difficult, because even version 1 would have to get all the details right in order to have people trust their work to it.
This question is pretty much answered but I would like to complement X-Istence's and Dana the Sane's answer with a small suggestion.
If you need revision control with some degree of granularity, say daily, you could couple the text dump of both the tables and the schema with a tool like rdiff-backup which does incremental backups. The advantage is that instead of storing snapshots of daily backups, you simply store the differences from the previous day.
With this you have both the advantage of revision control and you don't waste too much space.
In any case, using git directly on big flat files which change very frequently is not a good solution. If your database becomes too big, git will start to have some problems managing the files.
Here is what i am trying to do in my projects:
separate data and schema and default data.
The database configuration is stored in configuration file that is not under version control (.gitignore)
The database defaults (for setting up new Projects) is a simple SQL file under version control.
For the database schema create a database schema dump under the version control.
The most common way is to have update scripts that contains SQL Statements, (ALTER Table.. or UPDATE). You also need to have a place in your database where you save the current version of you schema)
Take a look at other big open source database projects (piwik,or your favorite cms system), they all use updatescripts (1.sql,2.sql,3.sh,4.php.5.sql)
But this a very time intensive job, you have to create, and test the updatescripts and you need to run a common updatescript that compares the version and run all necessary update scripts.
So theoretically (and thats what i am looking for) you could
dumped the the database schema after each change (manually, conjob, git hooks (maybe before commit))
(and only in some very special cases create updatescripts)
After that in your common updatescript (run the normal updatescripts, for the special cases) and then compare the schemas (the dump and current database) and then automatically generate the nessesary ALTER Statements. There some tools that can do this already, but haven't found yet a good one.
What I do in my personal projects is, I store my whole database to dropbox and then point MAMP, WAMP workflow to use it right from there.. That way database is always up-to-date where ever I need to do some developing. But that's just for dev! Live sites is using own server for that off course! :)
Storing each level of database changes under git versioning control is like pushing your entire database with each commit and restoring your entire database with each pull.
If your database is so prone to crucial changes and you cannot afford to loose them, you can just update your pre_commit and post_merge hooks.
I did the same with one of my projects and you can find the directions here.
That's how I do it:
Since your have free choise about DB type use a filebased DB like e.g. firebird.
Create a template DB which has the schema that fits your actual branch and store it in your repository.
When executing your application programmatically create a copy of your template DB, store it somewhere else and just work with that copy.
This way you can put your DB schema under version control without the data. And if you change your schema you just have to change the template DB
We used to run a social website, on a standard LAMP configuration. We had a Live server, Test server, and Development server, as well as the local developers machines. All were managed using GIT.
On each machine, we had the PHP files, but also the MySQL service, and a folder with Images that users would upload. The Live server grew to have some 100K (!) recurrent users, the dump was about 2GB (!), the Image folder was some 50GB (!). By the time that I left, our server was reaching the limit of its CPU, Ram, and most of all, the concurrent net connection limits (We even compiled our own version of network card driver to max out the server 'lol'). We could not (nor should you assume with your website) put 2GB of data and 50GB of images in GIT.
To manage all this under GIT easily, we would ignore the binary folders (the folders containing the Images) by inserting these folder paths into .gitignore. We also had a folder called SQL outside the Apache documentroot path. In that SQL folder, we would put our SQL files from the developers in incremental numberings (001.florianm.sql, 001.johns.sql, 002.florianm.sql, etc). These SQL files were managed by GIT as well. The first sql file would indeed contain a large set of DB schema. We don't add user-data in GIT (eg the records of the users table, or the comments table), but data like configs or topology or other site specific data, was maintained in the sql files (and hence by GIT). Mostly its the developers (who know the code best) that determine what and what is not maintained by GIT with regards to SQL schema and data.
When it got to a release, the administrator logs in onto the dev server, merges the live branch with all developers and needed branches on the dev machine to an update branch, and pushed it to the test server. On the test server, he checks if the updating process for the Live server is still valid, and in quick succession, points all traffic in Apache to a placeholder site, creates a DB dump, points the working directory from 'live' to 'update', executes all new sql files into mysql, and repoints the traffic back to the correct site. When all stakeholders agreed after reviewing the test server, the Administrator did the same thing from Test server to Live server. Afterwards, he merges the live branch on the production server, to the master branch accross all servers, and rebased all live branches. The developers were responsible themselves to rebase their branches, but they generally know what they are doing.
If there were problems on the test server, eg. the merges had too many conflicts, then the code was reverted (pointing the working branch back to 'live') and the sql files were never executed. The moment that the sql files were executed, this was considered as a non-reversible action at the time. If the SQL files were not working properly, then the DB was restored using the Dump (and the developers told off, for providing ill-tested SQL files).
Today, we maintain both a sql-up and sql-down folder, with equivalent filenames, where the developers have to test that both the upgrading sql files, can be equally downgraded. This could ultimately be executed with a bash script, but its a good idea if human eyes kept monitoring the upgrade process.
It's not great, but its manageable. Hope this gives an insight into a real-life, practical, relatively high-availability site. Be it a bit outdated, but still followed.
Update Aug 26, 2019:
Netlify CMS is doing it with GitHub, an example implementation can be found here with all information on how they implemented it netlify-cms-backend-github
I say don't. Data can change at any given time. Instead you should only commit data models in your code, schema and table definitions (create database and create table statements) and sample data for unit tests. This is kinda the way that Laravel does it, committing database migrations and seeds.
I would recommend neXtep (Link removed - Domain was taken over by a NSFW-Website) for version controlling the database it has got a good set of documentation and forums that explains how to install and the errors encountered. I have tested it for postgreSQL 9.1 and 9.3, i was able to get it working for 9.1 but for 9.3 it doesn't seems to work.
Use a tool like iBatis Migrations (manual, short tutorial video) which allows you to version control the changes you make to a database throughout the lifecycle of a project, rather than the database itself.
This allows you to selectively apply individual changes to different environments, keep a changelog of which changes are in which environments, create scripts to apply changes A through N, rollback changes, etc.
I'd like to put the entire database under version control, what
database engine can I use so that I can put the actual database under
version control instead of its dump?
This is not database engine dependent. By Microsoft SQL Server there are lots of version controlling programs. I don't think that problem can be solved with git, you have to use a pgsql specific schema version control system. I don't know whether such a thing exists or not...
Use a version-controlled database, of which there are now several.
https://www.dolthub.com/blog/2021-09-17-database-version-control/
These products don't apply version control on top of another type of database -- they are their own database engines that support version control operations. So you need to migrate to them or start building on them in the first place.
I write one of them, DoltDB, which combines the interfaces of MySQL and Git. Check it out here:
https://github.com/dolthub/dolt
I wish it were simpler. Checking in the schema as a text file is a good start to capture the structure of the DB. For the content, however, I have not found a cleaner, better method for git than CSV files. One per table. The DB can then be edited on multiple branches and merges extremely well.

Liquibase: Deleting Sql File used in Changeset

we are using liquibase for our DB-Deployment. We are currently doing some refactoring. There is an old changeset which is running an sql script. This has been executed in the past. Now this sql script (or the statement in it) is outdated and I would like to deleted from our VCS System. I did that and tried to deploy again, but I am getting an exception saying that the sql file is not there, although my Changeset has runAlways="false" and runOnChange="false".
Does this mean that I have to always keep the specific sql script always in my CVS although it is deprecated?
According how checksum works on Liquibase, you can't delete it. Technically you run that changeset one time, and something happened on DB, then liquibase created a checksum for your file. Now if you change some part of it, liquibase will say your file is updated and you can't do that.
So in summary, your file is part of the story of your application build, even if it is no longer useful, and you can't get rid of it.
If you are not in production case, one solution could be create a dump from your current database as init.sql and start from that.
I hope this helps you.

Using Liquibase and cherry-picking changesets

So I want to use liquibase as a replacement for SQL scripts for preparing databases in different development environment (SIT->UAT->PROD). The plan is to execute the liquibase update (with some other parameters in place if necessary), before start doing the testing.
The caveat is, all files (including liquibase XML) that are to be submitted to UAT and PROD must have been frozen; i.e. there could be no change for any files that has successfully passed SIT. Is there any way I can do this, so that in UAT I can only execute changesets which have successfully passed SIT (and similarly, in PROD I can only execute changesets which have successfully passed UAT), without actually altering the XML file on liquibase?
Thanks.
UPDATE
There are several issues which are inherent inside the current development cycle:
It would be redundant to ask the developers to run SIT again, this time with context=SIT being put inside.
Developer(s) only wanting to test their own changesets in the UAT. So a developer is only responsible for his own changesets; meaning they don't want to run others' changesets, even if these changesets have successfully passed SIT. Same issue also applies for UAT -> PROD.
Sorry I was not clear on this issue beforehand. I was tasked to implement Liquibase on my current workplace, and I don't have a really good picture of what's really happening in the cycle.
Liquibase does not allow you to pick certain changeSets to execute. The main reason for this is because the order that changes run against a database can make a big difference. Normally it doesn't help to have developers run just their changeSets because the database changes created by others are still needed by the application.
I think the most common way to handle your scenario is to rely on the same version control practices you use for your codebase. Liquibase is designed as a simple text format so that the changelog files can be stored in version control along with your code. Then, you can have branches for UAT and PROD and you can control what is going into those branches, including what changeSets are in the changelogs.
I think the best option wouuld be to use contexts (http://www.liquibase.org/documentation/contexts.html). ChangeSets that have passed SIT can be marked as context="sit". Then when you update UAT and PROD run with context=sit and only the tagged changesets will execute.
I think that based on your valid requirements that all scripts have to be frozen in the file-based version control which is external to Liquibase you have a major challenge here.
Liquibase cannot guarantee that files are frozen - it is not for Liquibase to know that.
You are welcome to review DBmaestro Teamwork, which enables you an enforced version control on your database objects (which guarantees that the repository and workspace database are in sync). Also generating the delta scripts handles all the merges (between different environments, UAT critical fixes, branches) from changes not originating from the development env.
Disclaimer -I'm working at DBmaestro

Can you control Liquibase updateSQL output by major release?

We use liquibase to generate database changes but need these scripted into SQL files because our DB server has neither Liquibase nor even a JVM installed.
We use the updateSQL command to create the DDL script needed to make changes, however if we run (on our development server) a 'dropAll' first we get a change set for every release.
Is there a way to run Liquibase regular 'update' for one collection of changesets (i.e. all prior releases) and then produce updateSQL output only for the last release? Essentially can we parameterize our build process to indicate the release we are targeting and automatically produce only SQL for that release?
Thanks
We have a similar setup.
The way we handle this is we have an "integration" db that always keeps the last official release.
When we have a new release candidate we let liquibase run (updateSQL) against that integration DB. Since it is on the last (resp. the current) release, updateSQL will only write out the difference between the new release candidate and the last release.
So you have a delta ddl that needs to be applied to go from release x to y.
Once the release candidate is released we let liquibase also update the integration db.
There are several ways to run a portion of your changelog.
If you break up your changelog files using versions (changelog-1.0.xml, changelog-1.1.xml, changelog-2.0.xml) and then have a master changelog.xml that <include>'s them, then the easiest solution for you may be to just run updateSql passing in the changelog version you want. If you want to generate sql for the entire database, run "liquibase --changelogFile=master.changelog.xml updateSql". If you want to generate sql for just version 2.1, run "liquibase --changeLogFile=changelog-2.1.xml updateSql"
#Jens answer works well and is what I often suggest. It has the advantage compared to the above option of catching new changeSets that were introduced into old changelog versions through a merged in patch release, for example.
Beyond that, you could use context or preconditions to dynamically control what is ran. Depending on your setup there may be ways to get those to work well for you.
Finally, there is always the extension system where you can write custom logic around how changelogs are parsed and executed to pull out older version changeSets based on file name or some other mechanism that works for you.