I wonder what is the best way to implement global data version for database. I want for any modification that is done to the database to incerease the version in "global version table" by one. I need this so that when I talk to application users I know what version of data we are talking about.
Should I store this information in table?
Should I use triggers for this?
This version number can be stored in a configuration table or in a dedicated table (with one field).
This parameter should not be automatically updated because you are the owner of the schema and you are responsible for knowing when you need to update it. Basically, you need to update this number every time you deploy a new application package (regardless of the reason for the package: code or database change).
Each and every deployment package should take care of updating the schema version number and the database schema (if necessary)
I tend to have a globals or settings table with various pseudo-static values stored.
- Just one row
- Many fields
This can include version numbers.
In terms of maintaining the version number you refer to, would this change when the data content changes? If so, the a trigger would be useful. If you mean for the version number to relate to table structures, etc, I'd be more inclined to manage this by hand. (Some changes may be irrelevant as far as teh applications are concerned, or there maybe several changes wrapped up into a single version upgrade.)
The best way to implement a "global data version for database" is via your source control system and build process. When all the changes have been submitted and passed testing your build process will increment your versioning number schema.
The version number could be implemented in a stored procedure. The result of the call to the stored proc could be added to a screen in your app so you can avoid users directly accessing a table.
To complete the previous answers, I came across the concept of "Migrations" (from the Ruby on Rails world apparently) today, and there was already a question on SO that covered existing frameworks in .Net.
The concept is still to store DB versioning information as data in a table somewhere, but for that versioning information to be managed automatically by a framework, rather than manually by your custom deployment processes:
previous SO question with overview of options: https://stackoverflow.com/questions/313/net-migrations-engine
We have a CMS built entirely in house. I'm the new web developer guy with literally 4 weeks of ColdFusion Experience. What I want to do is add version control to our dynamic pages. Something like what Wordpress does. When you modify a page in Wordpress it makes some database entires and keeps a copy of each page when you save it. So if you create a page and modifiy it 6 times, all in one day you have 7 different versions to roll back if necessary. Is there a easy way to do something similar in Coldfusion?
Please note I'm not talking about source control or version control of actual CFM files, all pages are done on the backend dynamically using SQL.
sure you can. just stash the page content in another database table. you can do that with ColdFusion or via a trigger in the database.
One way (there are many) to do this is to add a column called "version" and a column called "live" in the table where you're storing all of your cms pages.
The column called live is option but might make it easier for your in some ways when starting out.
The column "version" will tell you what revision number of a document in the CMS you have. By a process of elimination you could say the newest one (highest version #) would be the latest and live one. However, you may need to override this some time and turn an old page live, which is what the "live" setting can be set to.
So when you click "edit" on a page, you would take that version that was clicked, and copy it into a new higher version number. It stays as a draft until you click publish (at which time it's written as 'live')..
I hope that helps. This kind of an approach should work okay with most schema designs but I can't say for sure either without seeing it.
Jas' solution works well if most of the changes are to one field, for example the full text of a page of content.
However, if you have many fields, and people only tend to change one or two at a time, a new entry in to the table for each version can quickly get out of hand, with many almost identical versions in the history.
In this case what i like to do is store the changes on a per field basis in a table ChangeHistory. I include the table name, row ID, field name, previous value, new value, and who made the change and when.
This acts as a complete change history for any field in any table. I'm also able to view changes by record, by user, or by field.
For realtime page generation from the database, your best bet are "live" and "versioned" tables. Reason being keeping all data, live and versioned, in one table will negatively impact performance. So if page generation relies on a single SELECT query from the live table you can easily version the result set using ColdFusion's Web Distributed Data eXchange format (wddx) via the tag <cfwddx>. WDDX is a serialized data format that works particularly well with ColdFusion data (sorta like Python's pickle, albeit without the ability to deal with objects).
The versioned table could be as such:
PageID
Created
Data
Where data is the column storing the WDDX.
Note, you could also use built-in JSON support as well for version serialization (serializeJSON & deserializeJSON), but cfwddx tends to be more stable.
I'm building an application that runs on a Windows Mobile device. I'm using Microsoft's Sync Framework to sync the Sql CE database with the main corporate db.
The question is how can I limit the fields that are syncronized? The table in question has stacks of fields but I only need to display a few of them on the mobile device and replication is only one way (from the server to the mobile) so that shouldn't be an issue. I've seen this similar question but there's not much info there. Can anyone give me more advice on how to achieve this? I imagine that it's a very common requirement.
Also, does anyone know if I can use the Sync Framework Version 2.0 or do I have to stick to 1.0. I had a feeling that 2.0 doesn't support Windows Mobile but I'm not sure.
Cheers
Mark
You can change the T-SQL that's generated behind the scenes to not include all the columns of the table, but there are a couple of gotchas here. Firstly, it means that you can't use a wizard to modify the sync selection later - not a big deal, and creating your own partial class to override just the specific method with the T-SQL for your table mitigates that a bit.
Second, changes to the unincluded (not sure if that's a word?) columns can also trigger a download of that row as by default the change tracking is by row. You can change this by setting the Track_Columns_Updated flag
ALTER TABLE Employee
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)
Depending on the number of rows and size of the data and frequency updated, I have often found an easier solution is to provide a trigger on the main table of the server to update records in a separate table containing just the data you need, then sync that. It makes it much easier to change what's downloaded later. This is obviously not a solution if you are downloading the entire works of Shakespeare, but for a few 1000 records of a product catalogue, I think it's perfectly feasible.
Every week I access server logs processed by WebTrends (for about 7 profiles) and copy ad clickthrough and visitor information into Excel spreadsheets. A lot of it is just accessing certain sections and finding the right title and then copying the unique visitor information.
I tried using WebTrends' built-in query tool but that is really poorly done (only uses a drag-and-drop system instead of text-based) and it has a maximum number of parameters and maximum length of queries to query with. As far as I know, the tools in WebTrends are not suitable to my purpose of automating the entire web metrics gathering process.
I've gotten access to the raw server logs, but it seems redundant to parse that given that they are already being processed by WebTrends.
To me it seems very scriptable, but how would I go about doing that? Is screen-scraping an option?
I use ODBC for querying metrics and numbers out of webtrends. We even fill a scorecard with all key performance metrics..
Its in German, but maybe the idea helps you: http://www.web-scorecard.net/
Michael
Which version of WebTrends are you using? Unless this is a very old install, there should be options to schedule these reports to be emailed to you, and also to bookmark queries. Let me know which version it is and I can make some recommendations.
I'm using SQL-Server 2008 with Visual Studio Database Edition.
With this setup, keeping your schema in sync is very easy. Basically, there's a 'compare schema' tool that allow me to sync the schema of two databases and/or a database schema with a source-controlled creation script folder.
However, the situation is less clear when it comes to data, which can be of three different kind :
static data referenced in the code. typical example : my users can change their setting, and their configuration is stored on the server. However, there's a system-wide default value for each setting that is used in case the user didn't override it. The table containing those default settings grows as more options are added to the program. This means that when a new feature/option is checked in, the system-wide default setting is usually created in the database as well.
static data. eg. a product list populating a dropdown list. The program doesn't rely on the existence of a specific product in the list to work. This can be for example a list of unicode-encoded products that should be deployed in production when the new "unicode version" of the program is deployed.
other data, ie everything else (logs, user accounts, user data, etc.)
It seems obvious to me that my third item shouldn't be source-controlled (of course, it should be backuped on a regular basis)
But regarding the static data, I'm wondering what to do.
Should I append the insert scripts to the creation scripts? or maybe use separate scripts?
How do I (as a developer) warn the people doing the deployment that they should execute an insert statement ?
Should I differentiate my two kind of data? (the first one being usually created by a dev, while the second one is usually created by a non-dev)
How do you manage your DB static data ?
I have explained the technique I used in my blog Version Control and Your Database. I use database metadata (in this case SQL Server extended properties) to store the deployed application version. I only have scripts that upgrade from version to version. At startup the application reads the deployed version from the database metadata (lack of metadata is interpreted as version 0, ie. nothing is yet deployed). For each version there is an application function that upgrades to the next version. Usually this function runs an internal resource T-SQL script that does the upgrade, but it can be something else, like deploying a CLR assembly in the database.
There is no script to deploy the 'current' database schema. New installments iterate trough all intermediate versions, from version 1 to current version.
There are several advantages I enjoy by this technique:
Is easy for me to test a new version. I have a backup of the previous version, I apply the upgrade script, then I can revert to the previous version, change the script, try again, until I'm happy with the result.
My application can be deployed on top of any previous version. Various clients have various deployed version. When they upgrade, my application supports upgrade from any previous version.
There is no difference between a fresh install and an upgrade, it runs the same code, so I have fewer code paths to maintain and test.
There is no difference between DML and DDL changes (your original question). they all treated the same way, as script run to change from one version to next. When I need to make a change like you describe (change a default), I actually increase the schema version even if no other DDL change occurs. So at version 5.1 the default was 'foo', in 5.2 the default is 'bar' and that is the only difference between the two versions, and the 'upgrade' step is simply an UPDATE statement (followed of course by the version metadata change, ie. sp_updateextendedproperty).
All changes are in source control, part of the application sources (T-SQL scripts mostly).
I can easily get to any previous schema version, eg. to repro a customer complaint, simply by running the upgrade sequence and stopping at the version I'm interested in.
This approach saved my skin a number of times and I'm a true believer now. There is only one disadvantage: there is no obvious place to look in source to find 'what is the current form of procedure foo?'. Because the latest version of foo might have been upgraded 2 or 3 versions ago and it wasn't changed since, I need to look at the upgrade script for that version. I usually resort to just looking into the database and see what's in there, rather than searching through the upgrade scripts.
One final note: this is actually not my invention. This is modeled exactly after how SQL Server itself upgrades the database metadata (mssqlsystemresource).
If you are changing the static data (adding a new item to the table that is used to generate a drop-down list) then the insert should be in source control and deployed with the rest of the code. This is especially true if the insert is needed for the rest of the code to work. Otherwise, this step may be forgotten when the code is deployed and not so nice things happen.
If static data comes from another source (such as an import of the current airport codes in the US), then you may simply need to run an already documented import process. The import process itself should be in source control (we do this with all our SSIS packages), but the data need not be.
Here at Red Gate we recently added a feature to SQL Data Compare allowing static data to be stored as DML (one .sql file for each table) alongside the schema DDL that is currently supported by SQL Compare.
To understand how this works, here is a diagram that explains how it works.
The idea is that when you want to push changes to your target server, you do a comparison using the scripts as the source data source, which generates the necessary DML synchronization script to update the target. This means you don't have to assume that the target is being recreated from scratch each time. In time we hope to support static data in our upcoming SQL Source Control tool.
David Atkinson, Product Manager, Red Gate Software
I have come across this when developing CMS systems.
I went with appending the static data (the stuff referenced in the code) to the database creation scripts, then a separate script to add in any 'initialisation data' (like countries, initial product population etc).
For the first two steps, you could consider using an intermediate format (ie XML) for the data, then using a home grown tool, or something like CodeSmith to generate the SQL, and possible source files as well, if (for example) you have lookup tables which relate to enumerations used in the code - this helps enforce consistency.
This has another benefit that if the schema changes, in many cases you don't have to regenerate all your INSERT statements - you just change the tool.
I really like your distinction of the three types of data.
I agree for the third.
In our application, we try to avoid putting in the database the first, because it is duplicated (as it has to be in the code, the database is a duplicate). A secondary benefice is that we need no join or query to get access to that value from the code, so this speed things up.
If there is additional information that we would like to have in the database, for example if it can be changed per customer site, we separate the two. Other tables can still reference that data (either by index ex: 0, 1, 2, 3 or by code ex: EMPTY, SIMPLE, DOUBLE, ALL).
For the second, the scripts should be in source-control. We separate them from the structure (I think they typically are replaced as time goes, while the structures keeps adding deltas).
How do I (as a developer) warn the people doing the deployment that they should execute an insert statement ?
We have a complete procedure for that, and a readme coming with each release, with scripts and so on...
First off, I have never used Visual Studio Database Edition. You are blessed (or cursed) with whatever tools this utility gives you. Hopefully that includes a lot of flexibility.
I don't know that I'd make that big a difference between your type 1 and type 2 static data. Both are sets of data that are defined once and then never updated, barring subsequent releases and updates, right? In which case the main difference is in how or why the data is as it is, and not so much in how it is stored or initialized. (Unless the data is environment-specific, as in "A" for development, "B" for Production. This would be "type 4" data, and I shall cheerfully ignore it in this post, because I've solved it useing SQLCMD variables and they give me a headache.)
First, I would make a script to create all the tables in the database--preferably only one script, otherwise you can have a LOT of scripts lying about (and find-and-replace when renaming columns becomes very awkward). Then, I would make a script to populate the static data in these tables. This script could be appended to the end of the table script, or made it's own script, or even made one script per table, a good idea if you have hundreds or thousands of rows to load. (Some folks make a csv file and then issue a BULK INSERT on it, but I'd avoid that is it just gives you two files and a complex process [configuring drive mappings on deployment] to manage.)
The key thing to remember is that data (as stored in databases) can and will change over time. Rarely (if ever!) will you have the luxury of deleting your Production database and replacing it with a fresh, shiny, new one devoid of all that crufty data from the past umpteen years. Databases are all about changes over time, and that's where scripts come into their own. You start with the scripts to create the database, and then over time you add scripts that modify the database as changes come along -- and this applies to your static data (of any type) as well.
(Ultimately, my methodology is analogous to accounting: you have accounts, and as changes come in you adjust the accounts with journal entries. If you find you made a mistake, you never go back and modify your entries, you just make a subsequent entries to reverse and fix them. It's only an analogy, but the logic is sound.)
The solution I use is to have create and change scripts in source control, coupled with version information stored in the database.
Then, I have an install wizard that can detect whether it needs to create or update the db - the update process is managed by picking appropriate scripts based on the stored version information in the database.
See this thread's answer. Static data from your first two points should be in source control, IMHO.
Edit: *new
all-in-one or a separate script? it does not really matter as long as you (dev team) agree with your deployment team. I prefer to separate files, but I still can always create all-in-one.sql from those in the proper order [Logins, Roles, Users; Tables; Views; Stored Procedures; UDFs; Static Data; (Audit Tables, Audit Triggers)]
how do you make sure they execute it: well, make it another step in your application/database deployment documentation. If you roll out application which really needs specific (new) static data in the database, then you might want to perform a DB version check in your application. and you update the DB_VERSION to your new release number as part of that script. Then your application on a start-up should check it and report an error if the new DB version is required.
dev and non-dev static data: I have never seen this case actually. More often there is real static data, which you might call "dev", which is major configuration, ISO static data etc. The other type is default lookup data, which is there for users to start with, but they might add more. The mechanism to INSERT these data might be different, because you need to ensure you do not destoy (power-)user-created data.