I want to split the development of our data warehouse solution in to manageable Visual Studio projects that can be edited independently and grow organically. I have developed a project that contains all the conformed dimensions like Date in one solution. Can I reference my Date dimension from a different solution where I need to include a foreign key? I have tried to reference the dacpac containing the conformed dimension but this does not work as expected.
In the past I've managed these situations using separate projects in a single solution. Clearly the main goal is to avoid merge conflicts between the developers, with this arrangement the only place this should happen is in the solution file, which should be pretty rare.
There are a couple of things to be careful of here:
The reference must be of type "Same database"
The reference needs to be in the direction Fact -> Dimension
When you deploy the "Fact" project, you need to be sure to specify the option to "Include Composite Objects". This is the default in Visual Studio.
Circular references are not allowed. If you have these the workaround (very briefly, google for more) is to create a "parent" project with references to both "sides" of the "circle".
That said, it is also possible to create a foreign key to a referenced dacpac, rather than a database project. Again the reference needs to be in the direction Fact -> Dimension. You will also need to give some thought to your build process as in effect you are taking a binary dependency on the "Dimensions" dacpac. You can also add the dacpac to the referencing project (I tend to create a folder for these) so it ends up in the same place in source control).
In case it helps, I have created a solution that demonstrates both techniques and shared it here: https://github.com/gavincampbell-dev-example-repos/FKsToReferencedDacpacs
In short, you can not create an FK referencing a table in another database, both tables should be located in the same database. FK is a database object, not a server object.
You can however use triggers to manage FK, which is of course not the best solution.
FOREIGN KEY constraints can reference only tables within the same database on the same server. Cross-database referential integrity must be implemented through triggers. For more information, see CREATE TRIGGER (Transact-SQL).
https://msdn.microsoft.com/en-us/library/ms189049.aspx
Based on your comment (see below), here are some thoughts:
You have several good arguments. Though I was looking for a way where we could model several business processes in parallel this might not be an option. It seems that the only option might be to have one big project and then handle how to manage more than one developer working in the solution at the same time - thanks
One database = one project
Enforce communication between team members
Update your VCS policies to minimize problems
Educate the team members about possible problems and their workarounds
Including deployment options
Provide sufficient resources (including dev workstations) and use local instances for development
Build an SIT (integration) environment where you can test changes and how they behave during deployment.
Build a shared DEV environment which is used to merge the work of your team members
Commit/push files into CVS which are considered as done. Commit/push regularly
Route tasks to minimize conflicts (if a team member has something to do with object A, try to route all tasks related to object A to that developer or postpone the second change until the first one is done)
If you use dacpac for deployment, include the configuration file (tested in SIT env) next to the dacpac file. (SIT should reflect the production environment when you are testing the deployment)
And again: Enforce communication inside and outside the team. This is the key.
Related
Dacpac is nice solution for versioning schema and we have to use pre/post deployment to amend the reference data.
Any better solution to do that?
The best way I have seen is to use merge statements, one table per file and import them into your post-deploy script using :r imports.
You get version history and easily comparable data and using sp_generate_merge makes it really simple.
Ed
If you're looking for a solution in SSDT to handle reference that does not involve the use of pre/post-deployment scripts, unfortunately there currently isn't one.
But it is currently one of the most requested features in SSDT so perhaps there's a chance it will get implemented some time in the future.
I am the maintainer of the sp_generate_merge OSS utility Ed mentioned, and at Redgate we recommend this approach to handling reference data in an offline way to our customers, in the following circumstances:
If the data within the table is changing very frequently, as the one-file-per-table approach allows for branching/merging of concurrent changes from multiple developers
If the data in the table contains environment-specific values, like application settings, as this method allows you to use SQLCMD variables and feed the values in from a deployment tool
Where the offline approach can be problematic:
Non-determinism of the MERGE statement: before actually running the deployment against your target environment, it can be difficult to know what changes will be applied (if any). Worst case scenario, you could hit one of the documented issues in MERGE
The workflow isn't necessarily the most natural way to edit data, as it requires running the utility proc and copying+pasting the output back into the original file. Editing the file directly is an alternative, but isn't the most user-friendly experience especially with large amounts of reference data
Co-ordinating changes to both the schema and data within the reference table can be a challenge, given that SSDT is still responsible applying the schema changes. For example, if you want to add a new NOT NULL column without a default.
Another solution involves following an online approach, which is supported by our SSDT-alternative, ReadyRoll database projects. It allows the data to be edited directly in the database and subsequently imported into the project, with the sync script (i.e. containing INSERT, UPDATE, DELETE statements instead of MERGE) generated by its data comparison tool, alongside any schema changes.
You can read more about how the offline and online approaches differ in the ReadyRoll documentation.
I'm in the research phase trying to adopt 2012 Database Projects on an existing small project. I'm a C# developer, not a DBA, so I'm not particularly fluent with best practices. I've been searching google and stackoverflow for a few hours now but I still don't know how to handle some key deployment scenarios properly.
1) Over the course of several development cycles, how do I manage multiple versions of my database? If I have a client on v3 of my database and I want to upgrade them to v8, how do I manage this? We currently manage hand-crafted schema and data migration scripts for every version of our product. Do we still need to do this separately or is there something in the new paradigm that supports or replaces this?
2) If the schema changes in such a way that requires data to be moved around, what is the best way to handle this? I assume some work goes in the Pre-Deployment script to preserve the data and then the Post-Deploy script puts it back in the right place. Is that the way of it or is there something better?
3) Any other advice or guidance on how best to work with these new technologies is also greately appreciated!
UPDATE: My understanding of the problem has grown a little since I originally asked this question and while I came up with a workable solution, it wasn't quite the solution I was hoping for. Here's a rewording of my problem:
The problem I'm having is purely data related. If I have a client on version 1 of my application and I want to upgrade them to version 5 of my application, I would have no problems doing so if their database had no data. I'd simply let SSDT intelligently compare schemas and migrate the database in one shot. Unfortunately clients have data so it's not that simple. Schema changes from version 1 of my application to version 2 to version 3 (etc) all impact data. My current strategy for managing data requires I maintain a script for each version upgrade (1 to 2, 2 to 3, etc). This prevents me from going straight from version 1 of my application to version 5 because I have no data migration script to go straight there. The prospect creating custom upgrade scripts for every client or managing upgrade scripts to go from every version to every greater version is exponentially unmanageable. What I was hoping was that there was some sort of strategy SSDT enables that makes managing the data side of things easier, maybe even as easy as the schema side of things. My recent experience with SSDT has not given me any hope of such a strategy existing but I would love to find out differently.
I've been working on this myself, and I can tell you it's not easy.
First, to address the reply by JT - you cannot dismiss "versions", even with declarative updating mechanics that SSDT has. SSDT does a "pretty decent" job (provided you know all the switches and gotchas) of moving any source schema to any target schema, and it's true that this doesn't require verioning per se, but it has no idea how to manage "data motion" (at least not that i can see!). So, just like DBProj, you left to your own devices in Pre/Post scripts. Because the data motion scripts depend on a known start and end schema state, you cannot avoid versioning the DB. The "data motion" scripts, therefore, must be applied to a versioned snapshot of the schema, which means you cannot arbitrarily update a DB from v1 to v8 and expect the data motion scripts v2 to v8 to work (presumably, you wouldn't need a v1 data motion script).
Sadly, I can't see any mechanism in SSDT publishing that allows me to handle this scenario in an integrated way. That means you'll have to add your own scafolding.
The first trick is to track versions within the database (and SSDT project). I started using a trick in DBProj, and brought it over to SSDT, and after doing some research, it turns out that others are using this too. You can apply a DB Extended Property to the database itself (call it "BuildVersion" or "AppVersion" or something like that), and apply the version value to it. You can then capture this extended property in the SSDT project itself, and SSDT will add it as a script (you can then check the publish option that includes extended properties). I then use SQLCMD variables to identify the source and target versions being applied in the current pass. Once you identify the delta of versions between the source (project snapshot) and target (target db about to be updated), you can find all the snapshots that need to be applied. Sadly, this is tricky to do from inside the SSDT deployment, and you'll probably have to move it to the build or deployment pipeline (we use TFS automated deployments and have custom actions to do this).
The next hurdle is to keep snapshots of the schema with their associated data motion scripts. In this case, it helps to make the scripts as idempotent as possible (meaning, you can rerun the scripts without any ill side-effects). It helps to split scripts that can safely be rerun from scripts that must be executed one time only. We're doing the same thing with static reference data (dictionary or lookup tables) - in other words, we have a library of MERGE scripts (one per table) that keep the reference data in sync, and these scripts are included in the post-deployment scripts (via the SQLCMD :r command). The important thing to note here is that you must execute them in the correct order in case any of these reference tables have FK references to each other. We include them in the main post-deploy script in order, and it helps that we created a tool that generates these scripts for us - it also resolves dependency order. We run this generation tool at the close of a "version" to capture the current state of the static reference data. All your other data motion scripts are basically going to be special-case and most likely will be single-use only. In that case, you can do one of two things: you can use an IF statement against the db build/app version, or you can wipe out the 1 time scripts after creating each snapshot package.
It helps to remember that SSDT will disable FK check constraints and only re-enable them after the post-deployment scripts run. This gives you a chance to populate new non-null fields, for example (by the way, you have to enable the option to generate temporary "smart" defaults for non-null columns to make this work). However, FK check constraints are only disabled for tables that SSDT is recreating because of a schema change. For other cases, you are responsible for ensuring that data motion scripts run in the proper order to avoid check constraints complaints (or you manually have disable/re-enable them in your scripts).
DACPAC can help you because DACPAC is essentially a snapshot. It will contain several XML files describing the schema (similar to the build output of the project), but frozen in time at the moment you create it. You can then use SQLPACKAGE.EXE or the deploy provider to publish that package snapshot. I haven't quite figured out how to use the DACPAC versioning, because it's more tied to "registered" data apps, so we're stuck with our own versioning scheme, but we do put our own version info into the DACPAC filename.
I wish I had a more conclusive and exhasutive example to provide, but we're still working out the issues here too.
One thing that really sucks about SSDT is that unlike DBProj, it's currently not extensible. Although it does a much better job than DBProj at a lot of different things, you can't override its default behavior unless you can find some method inside of pre/post scripts of getting around a problem. One of the issues we're trying to resolve right now is that the default method of recreating a table for updates (CCDR) really stinks when you have tens of millions of records.
-UPDATE: I haven't seen this post in some time, but apparently it's been active lately, so I thought I'd add a couple of important notes: if you are using VS2012, the June 2013 release of SSDT now has a Data Comparison tool built-in, and also provides extensibility points - that is to say, you can now include Build Contributors and Deployment Plan Modifiers for the project.
I haven't really found any more useful information on the subject but I've spent some time getting to know the tools, tinkering and playing, and I think I've come up with some acceptable answers to my question. These aren't necessarily the best answers. I still don't know if there are other mechanisms or best practices to better support these scenarios, but here's what I've come up with:
The Pre- and Post-Deploy scripts for a given version of the database are only used migrate data from the previous version. At the start of every development cycle, the scripts are cleaned out and as development proceeds they get fleshed out with whatever sql is needed to safely migrate data from the previous version to the new one. The one exception here is static data in the database. This data is known at design time and maintains a permanent presence in the Post-Deploy scripts in the form of T-SQL MERGE statements. This helps make it possible to deploy any version of the database to a new environment with just the latest publish script. At the end of every development cycle, a publish script is generated from the previous version to the new one. This script will include generated sql to migrate the schema and the hand crafted deploy scripts. Yes, I know the Publish tool can be used directly against a database but that's not a good option for our clients. I am also aware of dacpac files but I'm not really sure how to use them. The generated publish script seems to be the best option I know for production upgrades.
So to answer my scenarios:
1) To upgrade a database from v3 to v8, I would have to execute the generated publish script for v4, then for v5, then for v6, etc. This is very similar to how we do it now. It's well understood and Database Projects seem to make creating/maintaining these scripts much easier.
2) When the schema changes from underneath data, the Pre- and Post-Deploy scripts are used to migrate the data to where it needs to go for the new version. Affected data is essentially backed-up in the Pre-Deploy script and put back into place in the Post-Deploy script.
3) I'm still looking for advice on how best to work with these tools in these scenarios and others. If I got anything wrong here, or if there are any other gotchas I should be aware of, please let me know! Thanks!
In my experience of using SSDT the notion of version numbers (i.e. v1, v2...vX etc...) for databases kinda goes away. This is because SSDT offers a development paradigm known as declarative database development which loosely means that you tell SSDT what state you want your schema to be in and then let SSDT take responsibility for getting it into that state by comparing against what you already have. In this paradigm the notion of deploying v4 then v5 etc.... goes away.
Your pre and post deployment scripts, as you correctly state, exist for the purposes of managing data.
Hope that helps.
JT
I just wanted to say that this thread so far has been excellent.
I have been wrestling with the exact same concerns and am attempting to tackle this problem in our organization, on a fairly large legacy application. We've begun the process of moving toward SSDT (on a TFS branch) but are at the point where we really need to understand the deployment process, and managing custom migrations, and reference/lookup data, along the way.
To complicate things further, our application is one code-base but can be customized per 'customer', so we have about 190 databases we are dealing with, for this one project, not just 3 or so as is probably normal. We do deployments all the time and even setup new customers fairly often. We rely heavily on PowerShell now with old-school incremental release scripts (and associated scripts to create a new customer at that version). I plan to contribute once we figure this all out but please share whatever else you've learned. I do believe we will end up maintaining custom release scripts per version, but we'll see. The idea about maintaining each script within the project, and including a From and To SqlCmd variable is very interesting. If we did that, we would probably prune along the way, physically deleting the really old upgrade scripts once everybody was past that version.
BTW - Side note - On the topic of minimizing waste, we also just spent a bunch of time figuring out how to automate the enforcement of proper naming/data type conventions for columns, as well as automatic generation for all primary and foreign keys, based on naming conventions, as well as index and check constraints etc. The hardest part was dealing with the 'deviants' that didn't follow the rules. Maybe I'll share that too one day if anyone is interested, but for now, I need to pursue this deployment, migration, and reference data story heavily. Thanks again. It's like you guys were speaking exactly what was in my head and looking for this morning.
Do you use SchemaExport and SchemaUpdate in real applications? Initially, you create model and then generate schema? Does it work? Or, you use it only for tests...
Usually, I create db (using visual studio database project) and then mappings and persistent classes or EF entities using designer. But now, I want to try code first approach with Fluent NHibernate.
I have researched SchemaExport and SchemaUpdate and found some issues. For example, update doesn't delete db objects, creates not null columns like nullable if table exists, doesn't generate primary key on many-to-many tables and so on. It mean that I have to recreate db very often. But, what's about data? And, how to deploy changes to production db and so on...
I want to know do you really use code first and SchemaExport(SchemaUpdate) in your applications? May be you can give me some advices...
I use SchemaUpdate in production. It is safe precisely because it never does destructive operations like deleting columns. However, it is not a comprehensive solution for updating your database. If you use it you will still have to supplement it with script to update your schema to do things like deleting (as you mention), indexes, changing column type, adding table data, etc. But SchemaUpdate covers the 90% case for me.
The only downside I've discovered is that over time it seems to occasionally add duplicate foreign-key constraints to my table.
One more thing: you should run SchemaUpdate manually from a build tool, not your app itself. It is not safe to give your application the rights to modify your db schema!
I use SchemaUpdate/SchemaExport for rapid evolution of my model, but they are not a replacement for a database migration tool. As you mention, data cannot be migrated in a sensible manner in many cases. The tool does not have enough context. (e.g. How can you automatically migrate a FullName column to FirstName/LastName?) I answered a similar question here where I discuss db migration tools in the context of NHibernate.
NHibernate, ORM : how is refactoring handled? existing data?
Yes, you can use these in real applications; I do.
Of course, almost all the work happens in that first go. My practice has been to create a separate project that references the mappings in my main project assembly and handles database creation and the initial data import, if any.
Once the project is in production, I usually unload that project from the solution, but keep it around for reference or if I ever need to switch from create scripts to update scripts.
As for the way NHibernate creates the database, you have to do a little more specification in your Fluent mappings than you otherwise might. I like to specify null/not null, foreign key constraint names, etc. to have maximum control over the way the database gets created.
I don't think you'd ever want to use automapping in this scenario.
Just with any generating code whether it be poco generation from a tool or database generation as in your question, it will probably get you 80% of the way there. From there it would be wise to tweak it the other 20% to add your indexes and any other performance tweaks to get it just right.
Up until now, my experience with databases has always been working with an intermediate definition layer that we have where I work. i.e. SQL wasn't directly written for the table definitions, but generated from an intermediate file which wrote out SQL scripts for creating the appropriate tables, upgrade scripts between schema changes, and helper functions for doing simple queries/updates/inserts/deletes from the database.
Now I'm in a situation where I don't have access to that, for reasons I won't get into, and I find myself somewhat lost at sea regarding what to do. I need to have a small number of tables in a database, and I'm unsure what's usually done to manage the table definitions.
Do people normally just use the SQL script that does the table creation as their definition, or does everyone just use an IDE that manages the definition in a separate file and regenerates the SQL script to create the tables?
I'd really prefer not to have to introduce a dependence on a specific IDE, because as we all know, developers are whiners that are prone to religious debates over small things.
Open your favorite text editor -> Start writing CREATE scripts -> Save -> Put in Source Control
That script now becomes the basis for you database. Anytime there are schema changes, they get put back into the scripts so that they don't get lost.
These become your definition.
I find it more reliable than depending on any specific IDE/Platform generating those scripts for you.
We write the scripts ourselves and store them in source control like any other code. Then the scripts that are appropriate for a particular version are all groupd together and promoted to prod together. Make sure to use alter table when changing existing tables becasue you don't want to drop and recreate them if they have data! I use a drop and recireate for all other objects though. If you need to add records to a particular table (usually a lookup of sometype) we do that in scripts as well. Then that too gets promoted with the rest of the version code.
For me, putting the scripts in source control however they are generated is the key step. This is how you know what you have changed for the next release. This is how you can see earlier versions and revert back easily if there is a problem. Treat database code the same wayyou treat all other code.
YOu could use one of the data modelling tools that creates scripts for you if you are starting out on a database design and the eventually want to create it for you. Some tools for that are Erwin, Fabforce etc... (though not free)
If you have access to an IDE like SQL Management studio, you can create them by using an GUI thats pretty simple.
If you are writing your own code, Its always better to write your own scripts based on a good template so that you cover all the properties of the definition of the table like the file_group, Collation & stuff. Hope this helps
Once you do create a base copy generate scripts and have a base reference copy of it so that you could do "incremental" changes on them and manage them in a source control.
Though I use TOAD for Oracle, I always write the scripts to create my database objects by hand. It gives you (and your DBA's) more control and knowledge of what's being created and how.
If your schema is too difficult to describe in SQL, you probably have other issues more pressing than which IDE. Use modelling documentation if you need a graphical representation, but yeah, you don't need an IDE.
There are multiple ways out there for what you are asking.
Old traditional way is to have a script file ready with your application that has CREATE TABLE statement.
If you are a developer, and that too a Java enterprise developer, you could generate complete schema using a persistence library called Hibernate. Here is a how to
If you are a DBA level user, you could take schema export from one environment and import that in to your current environment. This is a standard practice among DBAs. But it requires admin access as you can see. Also, the methods are dependent on the database you are using (oracle, db2 etc)
And how do you keep them in synch between test and production environments?
When it comes to indexes on database tables, my philosophy is that they are an integral part of writing any code that queries the database. You can't introduce new queries or change a query without analyzing the impact to the indexes.
So I do my best to keep my indexes in synch betweeen all of my environments, but to be honest, I'm not doing very well at automating this. It's a sort of haphazard, manual process.
I periodocally review index stats and delete unnecessary indexes. I usually do this by creating a delete script that I then copy back to the other environments.
But here and there indexes get created and deleted outside of the normal process and it's really tough to see where the differences are.
I've found one thing that really helps is to go with simple, numeric index names, like
idx_t_01
idx_t_02
where t is a short abbreviation for a table. I find index maintenance impossible when I try to get clever with all the columns involved, like,
idx_c1_c2_c5_c9_c3_c11_5
It's too hard to differentiate indexes like that.
Does anybody have a really good way to integrate index maintenance into source control and the development lifecycle?
Indexes are a part of the database schema and hence should be source controlled along with everything else. Nobody should go around creating indexes on production without going through the normal QA and release process- particularly performance testing.
There have been numerous other threads on schema versioning.
The full schema for your database should be in source control right beside your code. When I say "full schema" I mean table definitions, queries, stored procedures, indexes, the whole lot.
When doing a fresh installation, then you do:
- check out version X of the product.
- from the "database" directory of your checkout, run the database script(s) to create your database.
- use the codebase from your checkout to interact with the database.
When you're developing, every developer should be working against their own private database instance. When they make schema changes they checkin a new set of schema definition files that work against their revised codebase.
With this approach you never have codebase-database sync issues.
Yes, any DML or DDL changes are scripted and checked in to source control, mostly thru activerecord migrations in rails. I hate to continually toot rails' horn, but in many years of building DB-based systems I find the migration route to be so much better than any home-grown system I've used or built.
However, I do name all my indexes (don't let the DBMS come up with whatever crazy name it picks). Don't prefix them, that's silly (because you have type metadata in sysobjects, or in whatever db you have), but I do include the table name and columns, e.g. tablename_col1_col2.
That way if I'm browsing sysobjects I can easily see the indexes for a particular table (also it's a force of habit, wayyyy back in the day on some dBMS I used, index names were unique across the whole DB, so the only way to ensure that is to use unique names).
I think there are two issues here: the index naming convention, and adding database changes to your source control/lifecycle. I'll tackle the latter issue.
I've been a Java programmer for a long time now, but have recently been introduced to a system that uses Ruby on Rails for database access for part of the system. One thing that I like about RoR is the notion of "migrations". Basically, you have a directory full of files that look like 001_add_foo_table.rb, 002_add_bar_table.rb, 003_add_blah_column_to_foo.rb, etc. These Ruby source files extend a parent class, overriding methods called "up" and "down". The "up" method contains the set of database changes that need to be made to bring the previous version of the database schema to the current version. Similarly, the "down" method reverts the change back to the previous version. When you want to set the schema for a specific version, the Rails migration scripts check the database to see what the current version is, then finds the .rb files that get you from there up (or down) to the desired revision.
To make this part of your development process, you can check these into source control, and season to taste.
There's nothing specific or special about Rails here, just that it's the first time I've seen this technique widely used. You can probably use pairs of SQL DDL files, too, like 001_UP_add_foo_table.sql and 001_DOWN_remove_foo_table.sql. The rest is a small matter of shell scripting, an exercise left to the reader.
I always source-control SQL (DDL, DML, etc). Its code like any other. Its good practice.
I am not sure indexes should be the same across different environments since they have different data sizes. Unless your test and production environments have the same exact data, the indexes would be different.
As to whether they belong in source control, am not really sure.
I do not put my indexes in source control but the creation script of the indexes. ;-)
Index-naming:
IX_CUSTOMER_NAME for the field "name" in the table "customer"
PK_CUSTOMER_ID for the primary key,
UI_CUSTOMER_GUID, for the GUID-field of the customer which is unique (therefore the "UI" - unique index).
On my current project, I have two things in source control - a full dump of an empty database (using pg_dump -c so it has all the ddl to create tables and indexes) and a script that determines what version of the database you have, and applies alters/drops/adds to bring it up to the current version. The former is run when we're installing on a new site, and also when QA is starting a new round of testing, and the latter is run at every upgrade. When you make database changes, you're required to update both of those files.
Using a grails app the indexes are stored in source control by default since you are defining the index definition inside of a file that represents your domain object. Just offering the 'Grails' perspective as an FYI.