Managing database updates

Managing database updates - sql

I've been thinking of ways to improve managing changes to our database structure. I have a build server that creates nightly builds, so I was thinking we could somehow create database dumps, backups, and scripts from the test environment as part of the build process. Then when deploying an update to the client we could use a tool like DBDiff to create the database update script.
Is anybody doing something similar? Is it even a good idea? Maybe some good tips what to use to create these dumps on build server?

Rather than identifying the differences, I recommend to keep a proper script that creates a database from scratch.
We are quite satisfied with using Liquibase to manage all DB migration in our projects. It knows which "patches" have been applied and ensures that only those that are missing will be applied to the target database.

this is possible.
the differencing is the hard part. once you identify the differences, you need to construct the appropriate sql, then apply it. you can either apply it directly, or create some script that you can run after review.
when both sides change, then you need to decide if the target system should keep its change or if that should be completely removed.
remember that when the target system changes also include data, and if you remove some table or column, then your referential integrity might be completely ruined.
one more thought. you will need access to the target system in order to determine the diff. if this is a generic utility, you will need to make it an executable after the fact, not part of the build.

You will find the Visual Database Tools very useful here.
http://msdn.microsoft.com/en-us/library/y5a4ezk9.aspx
There is a schema compare built right into Visual Studio (it can also be run from the command line). There is also a database project that contains a complete set of scripts for the database and the objects that it contains. This can be checked into source control along with your source code.
You can deploy a new database based on these scripts with a context menu click.

Have a look at http://www.codeproject.com/KB/architecture/Database_CI.aspx and http://www.martinfowler.com/articles/evodb.html - there's a fair amount of thinking that's already available.

We are currently looking at the Juneau CTP release, SQL Tools for Visual Studio. It has a snapshot and schema comparison feature. Basically, it can auto-generate scripts between two schemas for you. If you use this against two versions of your database, it will give you an upgrade script.
http://msdn.microsoft.com/en-us/data/gg427686

Here at Red Gate we're close to releasing a solution which solves that precise issue using SQL Source Control and SQL Compare. We have an early access program which will allow you to try this out. Please visit the following link for sign-up details.
http://www.red-gate.com/MessageBoard/viewtopic.php?p=46951#46951

Related

Version controlling DDL changes

I'm trying to work out the best way to version changes in SQL.
I know that there are products like Redgate and Microsoft's SSDT, but equally I'm wondering if a more manual process might make it easier to automate deployments.
I have the following requirements
Must be able to produce diffs on table structure
Must be able to automate changes against the database
Must be able to blame changes and view commit comments
If I was using Redgate or SSDT, would I need to generate deployment scripts from the current state of the database?
Currently I'm wondering if placing change scripts and a syncronised create script into git/svn would be the easiest way to deliver this. But the likelyhood of the two getting out of sync make me uneasy.

Keeping change scripts in source control is a pain because to get back to a specific point in time you need to run the base create and then all the change scripts + you have to write manual rollback scripts (if you need them)
Writing manual change scripts also is a pain as you have to actual write the scripts and well this is the year 2015 so don't do that!
So I would really recommend using either SSDT or redgate, redgate is cool but SSDT is free and also includes design time checking and refactoring so if you do something like renaming a table in SSDT it will generate a sp_rename rather than dropping the first table and creating a new one (which is what the redgate tool would do)
Whichever you use there are command line version of the tools to do a compare / deploy when you actually want to release so just have the checked in code show the state you want the database to be in when you do the release.
For your requirements, both redgate and ssdt will do all three (except number 3 which is taken care of by having your database represented by create statements in source control)
The only thing you don't mention is static or reference data, to handle this either use the redgate data compare tool or you could possibly use the redgate source control ssms add-in if you are not going to use ssdt which lets you link tables to csv files.
If you are going to use SSDT then use a post-deploy script and have a merge statement for each table you need to store in source control.
SSDT rocks and really is the way forward for anyone developing T-Sql code (even if they don't realise it yet!)
Ed

How to properly manage database deployment with SSDT and Visual Studio 2012 Database Projects?

I'm in the research phase trying to adopt 2012 Database Projects on an existing small project. I'm a C# developer, not a DBA, so I'm not particularly fluent with best practices. I've been searching google and stackoverflow for a few hours now but I still don't know how to handle some key deployment scenarios properly.
1) Over the course of several development cycles, how do I manage multiple versions of my database? If I have a client on v3 of my database and I want to upgrade them to v8, how do I manage this? We currently manage hand-crafted schema and data migration scripts for every version of our product. Do we still need to do this separately or is there something in the new paradigm that supports or replaces this?
2) If the schema changes in such a way that requires data to be moved around, what is the best way to handle this? I assume some work goes in the Pre-Deployment script to preserve the data and then the Post-Deploy script puts it back in the right place. Is that the way of it or is there something better?
3) Any other advice or guidance on how best to work with these new technologies is also greately appreciated!
UPDATE: My understanding of the problem has grown a little since I originally asked this question and while I came up with a workable solution, it wasn't quite the solution I was hoping for. Here's a rewording of my problem:
The problem I'm having is purely data related. If I have a client on version 1 of my application and I want to upgrade them to version 5 of my application, I would have no problems doing so if their database had no data. I'd simply let SSDT intelligently compare schemas and migrate the database in one shot. Unfortunately clients have data so it's not that simple. Schema changes from version 1 of my application to version 2 to version 3 (etc) all impact data. My current strategy for managing data requires I maintain a script for each version upgrade (1 to 2, 2 to 3, etc). This prevents me from going straight from version 1 of my application to version 5 because I have no data migration script to go straight there. The prospect creating custom upgrade scripts for every client or managing upgrade scripts to go from every version to every greater version is exponentially unmanageable. What I was hoping was that there was some sort of strategy SSDT enables that makes managing the data side of things easier, maybe even as easy as the schema side of things. My recent experience with SSDT has not given me any hope of such a strategy existing but I would love to find out differently.

I've been working on this myself, and I can tell you it's not easy.
First, to address the reply by JT - you cannot dismiss "versions", even with declarative updating mechanics that SSDT has. SSDT does a "pretty decent" job (provided you know all the switches and gotchas) of moving any source schema to any target schema, and it's true that this doesn't require verioning per se, but it has no idea how to manage "data motion" (at least not that i can see!). So, just like DBProj, you left to your own devices in Pre/Post scripts. Because the data motion scripts depend on a known start and end schema state, you cannot avoid versioning the DB. The "data motion" scripts, therefore, must be applied to a versioned snapshot of the schema, which means you cannot arbitrarily update a DB from v1 to v8 and expect the data motion scripts v2 to v8 to work (presumably, you wouldn't need a v1 data motion script).
Sadly, I can't see any mechanism in SSDT publishing that allows me to handle this scenario in an integrated way. That means you'll have to add your own scafolding.
The first trick is to track versions within the database (and SSDT project). I started using a trick in DBProj, and brought it over to SSDT, and after doing some research, it turns out that others are using this too. You can apply a DB Extended Property to the database itself (call it "BuildVersion" or "AppVersion" or something like that), and apply the version value to it. You can then capture this extended property in the SSDT project itself, and SSDT will add it as a script (you can then check the publish option that includes extended properties). I then use SQLCMD variables to identify the source and target versions being applied in the current pass. Once you identify the delta of versions between the source (project snapshot) and target (target db about to be updated), you can find all the snapshots that need to be applied. Sadly, this is tricky to do from inside the SSDT deployment, and you'll probably have to move it to the build or deployment pipeline (we use TFS automated deployments and have custom actions to do this).
The next hurdle is to keep snapshots of the schema with their associated data motion scripts. In this case, it helps to make the scripts as idempotent as possible (meaning, you can rerun the scripts without any ill side-effects). It helps to split scripts that can safely be rerun from scripts that must be executed one time only. We're doing the same thing with static reference data (dictionary or lookup tables) - in other words, we have a library of MERGE scripts (one per table) that keep the reference data in sync, and these scripts are included in the post-deployment scripts (via the SQLCMD :r command). The important thing to note here is that you must execute them in the correct order in case any of these reference tables have FK references to each other. We include them in the main post-deploy script in order, and it helps that we created a tool that generates these scripts for us - it also resolves dependency order. We run this generation tool at the close of a "version" to capture the current state of the static reference data. All your other data motion scripts are basically going to be special-case and most likely will be single-use only. In that case, you can do one of two things: you can use an IF statement against the db build/app version, or you can wipe out the 1 time scripts after creating each snapshot package.
It helps to remember that SSDT will disable FK check constraints and only re-enable them after the post-deployment scripts run. This gives you a chance to populate new non-null fields, for example (by the way, you have to enable the option to generate temporary "smart" defaults for non-null columns to make this work). However, FK check constraints are only disabled for tables that SSDT is recreating because of a schema change. For other cases, you are responsible for ensuring that data motion scripts run in the proper order to avoid check constraints complaints (or you manually have disable/re-enable them in your scripts).
DACPAC can help you because DACPAC is essentially a snapshot. It will contain several XML files describing the schema (similar to the build output of the project), but frozen in time at the moment you create it. You can then use SQLPACKAGE.EXE or the deploy provider to publish that package snapshot. I haven't quite figured out how to use the DACPAC versioning, because it's more tied to "registered" data apps, so we're stuck with our own versioning scheme, but we do put our own version info into the DACPAC filename.
I wish I had a more conclusive and exhasutive example to provide, but we're still working out the issues here too.
One thing that really sucks about SSDT is that unlike DBProj, it's currently not extensible. Although it does a much better job than DBProj at a lot of different things, you can't override its default behavior unless you can find some method inside of pre/post scripts of getting around a problem. One of the issues we're trying to resolve right now is that the default method of recreating a table for updates (CCDR) really stinks when you have tens of millions of records.
-UPDATE: I haven't seen this post in some time, but apparently it's been active lately, so I thought I'd add a couple of important notes: if you are using VS2012, the June 2013 release of SSDT now has a Data Comparison tool built-in, and also provides extensibility points - that is to say, you can now include Build Contributors and Deployment Plan Modifiers for the project.

I haven't really found any more useful information on the subject but I've spent some time getting to know the tools, tinkering and playing, and I think I've come up with some acceptable answers to my question. These aren't necessarily the best answers. I still don't know if there are other mechanisms or best practices to better support these scenarios, but here's what I've come up with:
The Pre- and Post-Deploy scripts for a given version of the database are only used migrate data from the previous version. At the start of every development cycle, the scripts are cleaned out and as development proceeds they get fleshed out with whatever sql is needed to safely migrate data from the previous version to the new one. The one exception here is static data in the database. This data is known at design time and maintains a permanent presence in the Post-Deploy scripts in the form of T-SQL MERGE statements. This helps make it possible to deploy any version of the database to a new environment with just the latest publish script. At the end of every development cycle, a publish script is generated from the previous version to the new one. This script will include generated sql to migrate the schema and the hand crafted deploy scripts. Yes, I know the Publish tool can be used directly against a database but that's not a good option for our clients. I am also aware of dacpac files but I'm not really sure how to use them. The generated publish script seems to be the best option I know for production upgrades.
So to answer my scenarios:
1) To upgrade a database from v3 to v8, I would have to execute the generated publish script for v4, then for v5, then for v6, etc. This is very similar to how we do it now. It's well understood and Database Projects seem to make creating/maintaining these scripts much easier.
2) When the schema changes from underneath data, the Pre- and Post-Deploy scripts are used to migrate the data to where it needs to go for the new version. Affected data is essentially backed-up in the Pre-Deploy script and put back into place in the Post-Deploy script.
3) I'm still looking for advice on how best to work with these tools in these scenarios and others. If I got anything wrong here, or if there are any other gotchas I should be aware of, please let me know! Thanks!

In my experience of using SSDT the notion of version numbers (i.e. v1, v2...vX etc...) for databases kinda goes away. This is because SSDT offers a development paradigm known as declarative database development which loosely means that you tell SSDT what state you want your schema to be in and then let SSDT take responsibility for getting it into that state by comparing against what you already have. In this paradigm the notion of deploying v4 then v5 etc.... goes away.
Your pre and post deployment scripts, as you correctly state, exist for the purposes of managing data.
Hope that helps.
JT

I just wanted to say that this thread so far has been excellent.
I have been wrestling with the exact same concerns and am attempting to tackle this problem in our organization, on a fairly large legacy application. We've begun the process of moving toward SSDT (on a TFS branch) but are at the point where we really need to understand the deployment process, and managing custom migrations, and reference/lookup data, along the way.
To complicate things further, our application is one code-base but can be customized per 'customer', so we have about 190 databases we are dealing with, for this one project, not just 3 or so as is probably normal. We do deployments all the time and even setup new customers fairly often. We rely heavily on PowerShell now with old-school incremental release scripts (and associated scripts to create a new customer at that version). I plan to contribute once we figure this all out but please share whatever else you've learned. I do believe we will end up maintaining custom release scripts per version, but we'll see. The idea about maintaining each script within the project, and including a From and To SqlCmd variable is very interesting. If we did that, we would probably prune along the way, physically deleting the really old upgrade scripts once everybody was past that version.
BTW - Side note - On the topic of minimizing waste, we also just spent a bunch of time figuring out how to automate the enforcement of proper naming/data type conventions for columns, as well as automatic generation for all primary and foreign keys, based on naming conventions, as well as index and check constraints etc. The hardest part was dealing with the 'deviants' that didn't follow the rules. Maybe I'll share that too one day if anyone is interested, but for now, I need to pursue this deployment, migration, and reference data story heavily. Thanks again. It's like you guys were speaking exactly what was in my head and looking for this morning.

How to keep 2 Database Schemas consistent without effecting the data at all?

I have two server machines (One for development, other for Clients) with SQL Server 2008 installations. Whenever a developer makes changes to tables/views/stored procedures in the Development Server, it needs to reflect the Client Server as well.
Currently, I am manually handling all changes like new columns in Tables, changes in Stored procedures etc. Can DB scripts or replication automate the entire procedure for me? Or is there some better solution to keep database schemas consistent.
Help will be highly appreciated.
Thanks!

I highly recommend to create an environment where all schema changes are done exclusively through SQL scripts - never "manually" in any environment. Each developer has to commit the script related to his/her bugfixed (or new features) to a version control system.
Typically you'd have one big script that creates the database from scratch and one for each version upgrade (from 1.0 to 1.1, one from 1.1 to 1.2 and so on)
If you have the man power it is also very handy to maintain one "from-scratch" script for each version. Whether you need that or not depends on how often an installation on an empty system is done.
We have very good experience with using Liquibase to maintain all this. It automatically keeps track which patches have been applied to a database and which need to be run during an upgrade. It also prevents you to run the same migration twice.

A problem that all database applications have, and a difficult one to resolve. Such a solution cannot be scheduled, as the changes made by developers need to be tested first, and you certainly don't want untested code merged with your live database. This question is of interest to me because I'm currently writing a generic solution to resolve this issue once and for all.
But in the meantime, we're using an open-source product called Open DBDiff (Google it - you can't miss it), which could do with some polishing but works well enough. You pass it your source and target databases, and it generates a script to make the target the same as the source. It does seem to have some trouble copying assemblies and user roles, but for everything, I haven't had any trouble.

I believe a human should do the deployments, after making sure the changes have been tested and properly checked into the source control. This is not something to automate fully.
Human should use the tools though. I use Visual Studio 2010 Professional, which has a powerful schema comparison tool, generates and executes deployment scripts and has source control integration.

Defining table structure for a database?

Up until now, my experience with databases has always been working with an intermediate definition layer that we have where I work. i.e. SQL wasn't directly written for the table definitions, but generated from an intermediate file which wrote out SQL scripts for creating the appropriate tables, upgrade scripts between schema changes, and helper functions for doing simple queries/updates/inserts/deletes from the database.
Now I'm in a situation where I don't have access to that, for reasons I won't get into, and I find myself somewhat lost at sea regarding what to do. I need to have a small number of tables in a database, and I'm unsure what's usually done to manage the table definitions.
Do people normally just use the SQL script that does the table creation as their definition, or does everyone just use an IDE that manages the definition in a separate file and regenerates the SQL script to create the tables?
I'd really prefer not to have to introduce a dependence on a specific IDE, because as we all know, developers are whiners that are prone to religious debates over small things.

Open your favorite text editor -> Start writing CREATE scripts -> Save -> Put in Source Control
That script now becomes the basis for you database. Anytime there are schema changes, they get put back into the scripts so that they don't get lost.
These become your definition.
I find it more reliable than depending on any specific IDE/Platform generating those scripts for you.

We write the scripts ourselves and store them in source control like any other code. Then the scripts that are appropriate for a particular version are all groupd together and promoted to prod together. Make sure to use alter table when changing existing tables becasue you don't want to drop and recreate them if they have data! I use a drop and recireate for all other objects though. If you need to add records to a particular table (usually a lookup of sometype) we do that in scripts as well. Then that too gets promoted with the rest of the version code.
For me, putting the scripts in source control however they are generated is the key step. This is how you know what you have changed for the next release. This is how you can see earlier versions and revert back easily if there is a problem. Treat database code the same wayyou treat all other code.

YOu could use one of the data modelling tools that creates scripts for you if you are starting out on a database design and the eventually want to create it for you. Some tools for that are Erwin, Fabforce etc... (though not free)
If you have access to an IDE like SQL Management studio, you can create them by using an GUI thats pretty simple.
If you are writing your own code, Its always better to write your own scripts based on a good template so that you cover all the properties of the definition of the table like the file_group, Collation & stuff. Hope this helps
Once you do create a base copy generate scripts and have a base reference copy of it so that you could do "incremental" changes on them and manage them in a source control.

Though I use TOAD for Oracle, I always write the scripts to create my database objects by hand. It gives you (and your DBA's) more control and knowledge of what's being created and how.

If your schema is too difficult to describe in SQL, you probably have other issues more pressing than which IDE. Use modelling documentation if you need a graphical representation, but yeah, you don't need an IDE.

There are multiple ways out there for what you are asking.
Old traditional way is to have a script file ready with your application that has CREATE TABLE statement.
If you are a developer, and that too a Java enterprise developer, you could generate complete schema using a persistence library called Hibernate. Here is a how to
If you are a DBA level user, you could take schema export from one environment and import that in to your current environment. This is a standard practice among DBAs. But it requires admin access as you can see. Also, the methods are dependent on the database you are using (oracle, db2 etc)

Whats the best build system for building a database?

This is a problem that I come to on occasion and have yet to work out an answer that I'm happy with. I'm looking for a build system that works well for building a database - that is running all of the SQL files in the correct database instance as the correct user and in the correct order, and handling dependencies and the like properly.
I have a system that I hacked together using Gnu Make and it works, but it's not especially flexable and frankly can be a bit of a pain to work with in some situations. I've considered looking at things like SCons and CMake too, but I don't know how much better they are likely to be, or if there's a better system out there that already exists...

Just a shell script that runs all the create statements and imports in the proper order. You may also find migrations (comes with rails) interesting. It provides a make like infrastructure that let's you maintain a database the structure of which evolves over time.
Say you add a new column to some table. In migrations you'd write a snippet of code which describes the requirements for adding the column and also to rollback the change so you can switch to different versions of your schema automatically.
I'm not a big fan of the tight integration with rails, though, but the principles behind it are very interesting.

For SQL Server, I just use a batch file with SQLCMD.EXE and a bunch of .SQL files. It's not perfect, but it seems to work.

For my database, I use Migrator.NET
This is a .NET framework which allows you to create classes in where you define your DDL statements.
The framework comes with a command-line tool with which you can execute your 'migrations' in the correct order.
It also has a msbuild - task, so you can integrate it in a continuous integration build as well.

First export full DDL files describing all tables, views, source code
(procedures, functions, packages), sequences, and grants of a DB schema
See
Is there a tool to generate a full database DDL for SQL Server? What about Postgres and MySQL?
I created a database build system (part SQL-parser, part make file) to put these files together in a DB creation script using python.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas