Database objects version control (not schema)

Database objects version control (not schema) - sql

I'm trying to figure out how to implement version control in an environment where we have two DBs: one Testing and one Production.
In Testing. there are an arbitrary number of tasks being tested. These have no constraints in number of objects manipulated and complexity, meaning we can have a 3-day task that changes 2 package bodys and one trigger, and we can have a 3 month task that changes 100 different objects, including С source files and binary objects.
My main concern are the text-based objects of the DB. We need to version the Test and Production code, but any task can go from Testing to Production with no defined order whatsoever.
This means right now we have to manually track the changes in the files, selecting inside each file which lines in the code go from Testing to Production. We use a very rudimentary solution, writing in the header a sequence of comments with a file-based version number and adding in the code tags with that sequence to delimit the change.
I'm struggling to implement SVN because I wanted to create Testing as a branch of Production, having branches in Testing to limit each task, but I find that it can lead to many Testing tasks being ported to Production during merges.
This said, my questions are:
Is there a way to resolve this automatically?
Are there any database-specific version control solutions?
How can I "link" both environments if the code base is so different?

I used SVN for source control on DB scripts.
I dont have a technological solution to your problem but i can explain the methodology we used.
We had two sets of scripts - one for incremental changes and another for the complete declaration of database objects and procedures.
During development we updated only the incremental changes in a script that that was eventually used during deployment. during test rounds we updated the script.
Finally, After running the script on production we updated the second set of scripts containing the full declarations. The full scripts were used as reference and to create a db from scratch.

Related

ssdt: versioning reference data with dacpac

Dacpac is nice solution for versioning schema and we have to use pre/post deployment to amend the reference data.
Any better solution to do that?

The best way I have seen is to use merge statements, one table per file and import them into your post-deploy script using :r imports.
You get version history and easily comparable data and using sp_generate_merge makes it really simple.
Ed

If you're looking for a solution in SSDT to handle reference that does not involve the use of pre/post-deployment scripts, unfortunately there currently isn't one.
But it is currently one of the most requested features in SSDT so perhaps there's a chance it will get implemented some time in the future.
I am the maintainer of the sp_generate_merge OSS utility Ed mentioned, and at Redgate we recommend this approach to handling reference data in an offline way to our customers, in the following circumstances:
If the data within the table is changing very frequently, as the one-file-per-table approach allows for branching/merging of concurrent changes from multiple developers
If the data in the table contains environment-specific values, like application settings, as this method allows you to use SQLCMD variables and feed the values in from a deployment tool
Where the offline approach can be problematic:
Non-determinism of the MERGE statement: before actually running the deployment against your target environment, it can be difficult to know what changes will be applied (if any). Worst case scenario, you could hit one of the documented issues in MERGE
The workflow isn't necessarily the most natural way to edit data, as it requires running the utility proc and copying+pasting the output back into the original file. Editing the file directly is an alternative, but isn't the most user-friendly experience especially with large amounts of reference data
Co-ordinating changes to both the schema and data within the reference table can be a challenge, given that SSDT is still responsible applying the schema changes. For example, if you want to add a new NOT NULL column without a default.
Another solution involves following an online approach, which is supported by our SSDT-alternative, ReadyRoll database projects. It allows the data to be edited directly in the database and subsequently imported into the project, with the sync script (i.e. containing INSERT, UPDATE, DELETE statements instead of MERGE) generated by its data comparison tool, alongside any schema changes.
You can read more about how the offline and online approaches differ in the ReadyRoll documentation.

How to Version Control The Baseline Of DB Schema in Liquibase

I currently evaluate how my organisation could make use of liquibase as DB versioning system.
However, I do have difficulties on how the liquibase philosophy fits into our current workflow:
Currently we have the CREATE and initial fill sql scripts of our application under version control. As soon as there is a change (e.g. new column) the programmer adjusts the CREATE script and checks the changes in. Since application code and SQL scripts are in the same repository the DB objects should be always in sync with the application version.
Additionally, for each release we maintain a list of ALTER... statements which are used when the customer upgrades our software (which is much more often the case then installing from the scratch).
So we have two worlds - the current version of the schema objects as CREATE statements in the repository plus a list of necessary actions to get from version 1 to 2 (ALTER statements).
What I like is that the definition of the schema objects always matches the version of the software since the are in the same version control repository.
What I don't like (and hence looking for an alternative) is the double work we have to do. Furthermore, since the software is more updated than newly installed, the CREATE statements are more of a documentation but are rarely applied to the database.
What I understood is, that liquibase starts with a baseline and is then operating on small change sets. So I would once check in the base line and then add my small change sets.
Over the time I might have an old baseline with lots of change sets. I assume I then have to manually generate a new baseline out of old baseline + changesets and start from there again. This sounds rather confusing to me. I'm also not sure if my co-workers would see the benefits compared to our current workflow.
What would you recommend?

This is just my opinion, so take it as such.
The Liquibase philosophy is that the baselines you mention are not needed, as long as the database can be queried to see what changes have been applied to it, so your assumption that you would have to periodically generate a new baseline is incorrect.
Let's compare how your process works vs. the Liquibase process.
In your current system, as you said, developers have to maintain two sets of SQL scripts and your testers have to ensure that they are correct. When users install, your installer has to detect whether it is a clean install and run just the 'create' scripts. When the installer detects an upgrade, it has to take a different path (possibly using some complex logic) to determine which of the alter scripts to run. Your organization has to maintain the installer code that does the upgrade logic.
Using Liquibase, developers would only create the 'alter' part of the database scripts. For a new install, it might take fractionally longer to run all the Liquibase statements than it would take to run the create scripts in your current system, but the benefit is reduced duplication and therefore reduced bugs. For upgrade installs, Liquibase determines by looking at the installed database which changes are in place and only runs the changes that are necessary to bring the database up to date with the code that is being installed.

How to properly manage database deployment with SSDT and Visual Studio 2012 Database Projects?

I'm in the research phase trying to adopt 2012 Database Projects on an existing small project. I'm a C# developer, not a DBA, so I'm not particularly fluent with best practices. I've been searching google and stackoverflow for a few hours now but I still don't know how to handle some key deployment scenarios properly.
1) Over the course of several development cycles, how do I manage multiple versions of my database? If I have a client on v3 of my database and I want to upgrade them to v8, how do I manage this? We currently manage hand-crafted schema and data migration scripts for every version of our product. Do we still need to do this separately or is there something in the new paradigm that supports or replaces this?
2) If the schema changes in such a way that requires data to be moved around, what is the best way to handle this? I assume some work goes in the Pre-Deployment script to preserve the data and then the Post-Deploy script puts it back in the right place. Is that the way of it or is there something better?
3) Any other advice or guidance on how best to work with these new technologies is also greately appreciated!
UPDATE: My understanding of the problem has grown a little since I originally asked this question and while I came up with a workable solution, it wasn't quite the solution I was hoping for. Here's a rewording of my problem:
The problem I'm having is purely data related. If I have a client on version 1 of my application and I want to upgrade them to version 5 of my application, I would have no problems doing so if their database had no data. I'd simply let SSDT intelligently compare schemas and migrate the database in one shot. Unfortunately clients have data so it's not that simple. Schema changes from version 1 of my application to version 2 to version 3 (etc) all impact data. My current strategy for managing data requires I maintain a script for each version upgrade (1 to 2, 2 to 3, etc). This prevents me from going straight from version 1 of my application to version 5 because I have no data migration script to go straight there. The prospect creating custom upgrade scripts for every client or managing upgrade scripts to go from every version to every greater version is exponentially unmanageable. What I was hoping was that there was some sort of strategy SSDT enables that makes managing the data side of things easier, maybe even as easy as the schema side of things. My recent experience with SSDT has not given me any hope of such a strategy existing but I would love to find out differently.

I've been working on this myself, and I can tell you it's not easy.
First, to address the reply by JT - you cannot dismiss "versions", even with declarative updating mechanics that SSDT has. SSDT does a "pretty decent" job (provided you know all the switches and gotchas) of moving any source schema to any target schema, and it's true that this doesn't require verioning per se, but it has no idea how to manage "data motion" (at least not that i can see!). So, just like DBProj, you left to your own devices in Pre/Post scripts. Because the data motion scripts depend on a known start and end schema state, you cannot avoid versioning the DB. The "data motion" scripts, therefore, must be applied to a versioned snapshot of the schema, which means you cannot arbitrarily update a DB from v1 to v8 and expect the data motion scripts v2 to v8 to work (presumably, you wouldn't need a v1 data motion script).
Sadly, I can't see any mechanism in SSDT publishing that allows me to handle this scenario in an integrated way. That means you'll have to add your own scafolding.
The first trick is to track versions within the database (and SSDT project). I started using a trick in DBProj, and brought it over to SSDT, and after doing some research, it turns out that others are using this too. You can apply a DB Extended Property to the database itself (call it "BuildVersion" or "AppVersion" or something like that), and apply the version value to it. You can then capture this extended property in the SSDT project itself, and SSDT will add it as a script (you can then check the publish option that includes extended properties). I then use SQLCMD variables to identify the source and target versions being applied in the current pass. Once you identify the delta of versions between the source (project snapshot) and target (target db about to be updated), you can find all the snapshots that need to be applied. Sadly, this is tricky to do from inside the SSDT deployment, and you'll probably have to move it to the build or deployment pipeline (we use TFS automated deployments and have custom actions to do this).
The next hurdle is to keep snapshots of the schema with their associated data motion scripts. In this case, it helps to make the scripts as idempotent as possible (meaning, you can rerun the scripts without any ill side-effects). It helps to split scripts that can safely be rerun from scripts that must be executed one time only. We're doing the same thing with static reference data (dictionary or lookup tables) - in other words, we have a library of MERGE scripts (one per table) that keep the reference data in sync, and these scripts are included in the post-deployment scripts (via the SQLCMD :r command). The important thing to note here is that you must execute them in the correct order in case any of these reference tables have FK references to each other. We include them in the main post-deploy script in order, and it helps that we created a tool that generates these scripts for us - it also resolves dependency order. We run this generation tool at the close of a "version" to capture the current state of the static reference data. All your other data motion scripts are basically going to be special-case and most likely will be single-use only. In that case, you can do one of two things: you can use an IF statement against the db build/app version, or you can wipe out the 1 time scripts after creating each snapshot package.
It helps to remember that SSDT will disable FK check constraints and only re-enable them after the post-deployment scripts run. This gives you a chance to populate new non-null fields, for example (by the way, you have to enable the option to generate temporary "smart" defaults for non-null columns to make this work). However, FK check constraints are only disabled for tables that SSDT is recreating because of a schema change. For other cases, you are responsible for ensuring that data motion scripts run in the proper order to avoid check constraints complaints (or you manually have disable/re-enable them in your scripts).
DACPAC can help you because DACPAC is essentially a snapshot. It will contain several XML files describing the schema (similar to the build output of the project), but frozen in time at the moment you create it. You can then use SQLPACKAGE.EXE or the deploy provider to publish that package snapshot. I haven't quite figured out how to use the DACPAC versioning, because it's more tied to "registered" data apps, so we're stuck with our own versioning scheme, but we do put our own version info into the DACPAC filename.
I wish I had a more conclusive and exhasutive example to provide, but we're still working out the issues here too.
One thing that really sucks about SSDT is that unlike DBProj, it's currently not extensible. Although it does a much better job than DBProj at a lot of different things, you can't override its default behavior unless you can find some method inside of pre/post scripts of getting around a problem. One of the issues we're trying to resolve right now is that the default method of recreating a table for updates (CCDR) really stinks when you have tens of millions of records.
-UPDATE: I haven't seen this post in some time, but apparently it's been active lately, so I thought I'd add a couple of important notes: if you are using VS2012, the June 2013 release of SSDT now has a Data Comparison tool built-in, and also provides extensibility points - that is to say, you can now include Build Contributors and Deployment Plan Modifiers for the project.

I haven't really found any more useful information on the subject but I've spent some time getting to know the tools, tinkering and playing, and I think I've come up with some acceptable answers to my question. These aren't necessarily the best answers. I still don't know if there are other mechanisms or best practices to better support these scenarios, but here's what I've come up with:
The Pre- and Post-Deploy scripts for a given version of the database are only used migrate data from the previous version. At the start of every development cycle, the scripts are cleaned out and as development proceeds they get fleshed out with whatever sql is needed to safely migrate data from the previous version to the new one. The one exception here is static data in the database. This data is known at design time and maintains a permanent presence in the Post-Deploy scripts in the form of T-SQL MERGE statements. This helps make it possible to deploy any version of the database to a new environment with just the latest publish script. At the end of every development cycle, a publish script is generated from the previous version to the new one. This script will include generated sql to migrate the schema and the hand crafted deploy scripts. Yes, I know the Publish tool can be used directly against a database but that's not a good option for our clients. I am also aware of dacpac files but I'm not really sure how to use them. The generated publish script seems to be the best option I know for production upgrades.
So to answer my scenarios:
1) To upgrade a database from v3 to v8, I would have to execute the generated publish script for v4, then for v5, then for v6, etc. This is very similar to how we do it now. It's well understood and Database Projects seem to make creating/maintaining these scripts much easier.
2) When the schema changes from underneath data, the Pre- and Post-Deploy scripts are used to migrate the data to where it needs to go for the new version. Affected data is essentially backed-up in the Pre-Deploy script and put back into place in the Post-Deploy script.
3) I'm still looking for advice on how best to work with these tools in these scenarios and others. If I got anything wrong here, or if there are any other gotchas I should be aware of, please let me know! Thanks!

In my experience of using SSDT the notion of version numbers (i.e. v1, v2...vX etc...) for databases kinda goes away. This is because SSDT offers a development paradigm known as declarative database development which loosely means that you tell SSDT what state you want your schema to be in and then let SSDT take responsibility for getting it into that state by comparing against what you already have. In this paradigm the notion of deploying v4 then v5 etc.... goes away.
Your pre and post deployment scripts, as you correctly state, exist for the purposes of managing data.
Hope that helps.
JT

I just wanted to say that this thread so far has been excellent.
I have been wrestling with the exact same concerns and am attempting to tackle this problem in our organization, on a fairly large legacy application. We've begun the process of moving toward SSDT (on a TFS branch) but are at the point where we really need to understand the deployment process, and managing custom migrations, and reference/lookup data, along the way.
To complicate things further, our application is one code-base but can be customized per 'customer', so we have about 190 databases we are dealing with, for this one project, not just 3 or so as is probably normal. We do deployments all the time and even setup new customers fairly often. We rely heavily on PowerShell now with old-school incremental release scripts (and associated scripts to create a new customer at that version). I plan to contribute once we figure this all out but please share whatever else you've learned. I do believe we will end up maintaining custom release scripts per version, but we'll see. The idea about maintaining each script within the project, and including a From and To SqlCmd variable is very interesting. If we did that, we would probably prune along the way, physically deleting the really old upgrade scripts once everybody was past that version.
BTW - Side note - On the topic of minimizing waste, we also just spent a bunch of time figuring out how to automate the enforcement of proper naming/data type conventions for columns, as well as automatic generation for all primary and foreign keys, based on naming conventions, as well as index and check constraints etc. The hardest part was dealing with the 'deviants' that didn't follow the rules. Maybe I'll share that too one day if anyone is interested, but for now, I need to pursue this deployment, migration, and reference data story heavily. Thanks again. It's like you guys were speaking exactly what was in my head and looking for this morning.

Why isn't database version control considered as important as application version control?

I've recently started using Kiln Source Control for all my projects VB.NET code, and I don't know how I managed without it!
I've been looking for a database source control, for all my stored procedures, UDFs etc. However, I've found that there is not as much available for database version control as there is for my web files.
Why is database version control not considered as important as my web files? Surely all the programming in my database is just as important as the code in my code-behind and .aspx files?

Version controlling database objects IS important!
However, maybe it isn't considered as important because some people see a database merely as a tool that assists them? And external tools (normally) aren't version controlled.
One thing I've found hard to manage is the release process. Right now we're using red gates source control connected to svn. When it's time for release we do the same as with the rest of the code: Merge from one branch to the other. Then to deploy it we use sql compare to create a diff script between the merged revision and the actual database. Aside from some minor quirks and beginners mistakes I think this works well in a environment where there is no downtime (purposefully ;)) and which has a high speed development process (lots of releases).

You can maintain your database artifacts in your version control system for same.
Version control system is for versioning of artifacts and Artifacts can be Program code or database. We used same VC for code and database.

VCS are designed to store versions of text. They can store binaries but it is less efficient. And the DB state itself is not a text and can't even be directly stored as a binary. You can store the SQL code though.
one solution is to store a full DB dump (SQL or binary), another - to store sequences of SQL scripts that change one DB state to the next one. The second approach can be automated in some environments and is there called migrations. if you want a separate specific VCS for a DB, you can think that migration tools are such VCS.
There are also tools that can compare two DB states and produce a diff that is able to change the first state to the second.

I suppose it depends on whether you manage the database changes (like schema changes, migrations as mentioned by wRAR) as part of source code repository, in the form of sql scripts or other formats through the use of other tools, or do you consider this as database administration, and do that using traditional methods of backup/restore.
In my experience so far, although I wouldn't consider database management as any less important, it does happen on a very low frequency as compared to actual code changes. Your case is clearly different, but a combination of script files and database tools should take care of that.

Here's the reality.
Database version control -- that is to say DDL, DML, and even data for necessary reference data required for an application to have basic functionality - is as important as all other application assets under version control. Databases should never be under any special exception where it is considered acceptable for their assets (objects and necessary reference data) to not be under version control. Ever.
So why have they been? Simple. The toolchains to keep managing those assets simple haven't always been up to snuff (in the case of SQL Server, prior to Visual Studio 2008 it wasn't shipped with first-party tools from Microsoft), and the toolchains differ on a vendor-by-vendor basis. When those toolchains are deficient, unless the organization steps up to cover that deficiency, that deficiency remains. It's technical debt, and some organizations do not prioritize it due to either time or (sadly) skill, when the tools to make it easy don't exist or require integrating third-party tools into the development workflow.
The worst is trying to bring older projects under version control, since you have to bring everyone kicking and screaming with you all at once, in addition to selling that value to the business. I won't disagree that there may be more pressing immediate needs of the business, but getting database assets under version control needs to be somewhere on that list, even if it's a lower priority.
There's no excuse. I've fought more than enough project managers, data architects, and even CIOs/CTOs on this -- I've even made it a point to have detractors kicked off of every project I'm working on. It needs to be done, and if it's not there needs to be a timeline that the business will agree to in which it will be done. Those who argue against it need to be shot in the face, and survivors need to be shot again.

Anybody using SQL Source Control from Red Gate

We have been looking into possible solutions for our SQL Source control. I just came across Red Gates SQL Source control and wondered if anyone has implemented it? I am going to download the trial and give it a shot, but just wanted to see if others have real experience.
As always greatly appreciate the input
--S

I have updated my original post below to reflect changes in the latest versions of SQL Source Control (3.0) and SQL Compare (10.1).
Since this question was asked over a year ago, my response may not be that helpful to you, but for others who may currently be evaluating SSC, I thought I would throw in my two cents. We just started using SQL Source Control (SSC) and overall I am fairly satisfied with it so far. It does have some quirks though, especially if you are working in a shared database environment (as opposed to every developer working locally) and particularly working in a legacy environment where objects in the same database are divided haphazardly between development teams.
To give a brief overview of how we are using the product in our organization, we are working in a shared environment where we all make changes to the same development database, so we attached the shared database to the source control repository. Each developer is responsible for making changes to the objects in the database through SQL Server Management Studio (SSMS), and when they are finished, they can commit their changes to source control. When we are ready to deploy to staging, the build master (me) merges the development branch of the database code to the main (staging) branch and then runs SQL Compare using the main branch repository version of the database as the source and the live staging database as the target, and SQL Compare generates the necessary scripts to deploy the changes made to the staging environment. Staging to production deployments works in similar fashion. One other important point to note is that, given the fact that we are sharing the same database with other development teams, we use a built in feature of SSC that allows you to create filters on database objects by name, type, etc. We manually set up filters on our specific team's objects, excluding all other objects, so that we don't accidentally commit other development team's changes when we do our deployments.
So in general it's a fairly simple product to set up and use and it's really nice because you're always working with live objects in SSMS, as opposed to disconnected script files stored in a separate source repository that run the risk of getting out of sync. It's also nice because SQL Compare generates the deployment scripts for you so you don't have to worry about introducing errors as you would if you were creating the scripts on your own. And as SQL Compare is a very mature and stable product, you can feel pretty confident that it's going to create the proper scripts for you.
With that being said, however, here are some of the quirks that I have run into so far:
SSC is pretty chatty out of the box in terms of communicating with the db server in order to keep track of database items that are out of sync with the source control repository. It polls every few milliseconds and if you add in multiple developers all working against the same database using SSC, you can imagine that our dba's weren't very happy. Fortunately, you can easily reduce your polling frequency to something more acceptable, although at the cost of sacrificing responsive visual notifications of when objects have been changed.
Using the object filtering feature, you can't easily tell from looking at objects in SSMS which objects are included in your filter. So you don’t know for sure if an object is under source control, unlike in Visual Studio, where icons are used to indicate source controlled objects.
The object filtering GUI is very clunky. Due to the fact that we are working in a legacy database environment, there is currently not a clear separation between the objects that our team owns and those owned by other teams, so in order to prevent us from accidentally committing/deploying other teams’ changes, we have set up a filtering scheme to explicitly include each specific object that we own. As you can imagine, this becomes quite cumbersome, and as the GUI to edit the filters is set up to enter one object at a time, it could become quite painful, especially trying to set up your environment for the first time (I ended up writing an application to do this). Going forward, we are creating a new schema for our application to better facilitate object filtering (besides being a better practice anyway).
Using the shared database model, developers are allowed to commit any pending changes to a source controlled database, even if the changes are not theirs. SSC does give you a warning if you try to check in a bunch of changes that these changes might not be yours, but other than that you’re on your own. I actually find this to be one of SSC’s most dangerous “quirks”.
SQL Compare can’t currently share the object filters created by SSC, so you would have to manually create a matching filter in SQL Compare, so there is a danger that these could get out of sync. I just ended up cut-and-pasting the filters from the underlying SSC filter file into the SQL Compare project filter to avoid dealing with the clunky object filtering GUI. I believe that the next version of SQL Compare will allow it to share filters with SSC, so at least this problem is only a short term one. (NOTE: This issue has been resolved in the latest version of SQL Compare. SQL Compare can now use the object filters created by SSC.)
SQL Compare also can’t compare against a SSC database repository when launched directly. It has to be launched from within SSMS. I believe that the next version of SQL Compare will provide this functionality, so again it’s another short term problem. (NOTE: This issue has been resolved in the latest version of SQL Compare.)
Sometimes SQL Compare isn’t able to create the proper scripts to get the target database from one state to another, usually in the case where you are updating the schema of existing tables that aren’t empty, so you currently have to write manual scripts and manage the process yourself. Fortunately, this will be addressed through “migration scripts” in the next release of SSC, and from looking at the early release version of the product, it appears that the implementation of this new feature was well thought out and designed. (NOTE: Migration scripts functionality has been officially released. However, it does not currently support branching. If you want to use migration scripts, you will need to run sql compare against your original development code branch... the one where you checked in your changes... which is pretty clunky and has forced me to modify my build process in a less than ideal way in order to work around this limitation. Hopefully this will be addressed in a future release.)
Overall, I am pretty happy with the product and with Redgate’s responsiveness to user feedback and the direction that the product is taking. The product is very easy to use and well designed, and I feel that in the next release or two the product will probably give us most, if not all, of what we need.

I use SQL Compare for generating scripts when going from dev -> test -> production and it saves me tons of time.
For source control though, we use SVN and ScriptDB (http://scriptdb.codeplex.com/) though. I mainly use source control of SQL scripts for keeping track of changes. I think that rolling back a version of the database seldomly (if ever) works since data may have changed when making structure changes.
This works fine for a few of our current projects (largest is 200 tables and 2000 sprocs). The main reason for doing this though is cost since not all team members have to buy SQL Compare (I avoid adding dependencies to commercial projects unless really needed).

We performed an extensive evaluation of Red Gate's product and found a few major flaws. If you want to look at who changed an object, you can't do it without SysAdmin privileges. The product needs to look at the trace on your server, which requires those rights. I'm on a 5+ person team, and not knowing who had pending changes is what will stop us from using the product.

I just started working for a new company and they use Redgate SQL Source Control for all their projects, amonst them a large and complex one. It does the job well in tandem with TFS. The only drawback from my point of view is that the SQL Server Management Studio integration is highly unstable. Frequent crashes of SQL Server Management Studio happen when the tools are installed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas