how to compare/validate sql schema - sql

I'm looking for a way to validate the SQL schema on a production DB after updating an application version. If the application does not match the DB schema version, there should be a way to warn the user and list the changes needed.
Is there a tool or a framework (to use programatically) with built-in features to do that?
Or is there some simple algorithm to run this comparison?
Update: Red gate lists "from $395". Anything free? Or more foolproof than just keeping the version number?

Try this SQL.
- Run it against each database.
- Save the output to text files.
- Diff the text files.
/* get list of objects in the database */
SELECT name,
type
FROM sysobjects
ORDER BY type, name
/* get list of columns in each table / parameters for each stored procedure */
SELECT so.name,
so.type,
sc.name,
sc.number,
sc.colid,
sc.status,
sc.type,
sc.length,
sc.usertype ,
sc.scale
FROM sysobjects so ,
syscolumns sc
WHERE so.id = sc.id
ORDER BY so.type, so.name, sc.name
/* get definition of each stored procedure */
SELECT so.name,
so.type,
sc.number,
sc.text
FROM sysobjects so ,
syscomments sc
WHERE so.id = sc.id
ORDER BY so.type, so.name, sc.number

I hope I can help - this is the article I suggest reading:
Compare SQL Server database schemas automatically
It describes how you can automate the SQL Server schema comparison and synchronization process using T-SQL, SSMS or a third party tool.

You can do it programatically by looking in the data dictionary (sys.objects, sys.columns etc.) of both databases and comparing them. However, there are also tools like Redgate SQL Compare Pro that do this for you. I have specified this as a part of the tooling for QA on data warehouse systems on a few occasions now, including the one I am currently working on. On my current gig this was no problem at all, as the DBA's here were already using it.
The basic methodology for using these tools is to maintain a reference script that builds the database and keep this in version control. Run the script into a scratch database and compare it with your target to see the differences. It will also generate patch scripts if you feel so inclined.
As far as I know there's nothing free that does this unless you feel like writing your own. Redgate is cheap enough that it might as well be free. Even as a QA tool to prove that the production DB is not in the configuration it was meant to be it will save you its purchase price after one incident.

You can now use my SQL Admin Studio for free to run a Schema Compare, Data Compare and Sync the Changes. No longer requires a license key download from here http://www.simego.com/Products/SQL-Admin-Studio
Also works against SQL Azure.
[UPDATE: Yes I am the Author of the above program, as it's now Free I just wanted to Share it with the community]

If you are looking for a tool that can compare two databases and show you the difference Red Gate makes SQL Compare

You didn't mention which RDMBS you're using: if the INFORMATION SCHEMA views are available in your RDBMS, and if you can reference both schemas from the same host, you can query the INFORMATION SCHEMA views to identify differences in:
-tables
-columns
-column types
-constraints (e.g. primary keys, unique constraints, foreign keys, etc)
I've written a set of queries for exactly this purpose on SQL Server for a past job - it worked well to identify differences. Many of the queries were using LEFT JOINs with IS NULL to check for the absence of expected items, others were comparing things like column types or constraint names.
It's a little tedious, but its possible.

I found this small and free tool that fits most of my needs.
http://www.wintestgear.com/products/MSSQLSchemaDiff/MSSQLSchemaDiff.html
It's very basic but it shows you the schema differences of two databases.
It doesn't have any fancy stuff like auto generated scripts to make the differences to go away and it doesn't compare any data.
It's just a small, free utility that shows you schema differences :)

Make a table and store your version number in there. Just make sure you update it as necessary.
CREATE TABLE version (
version VARCHAR(255) NOT NULL
)
INSERT INTO version VALUES ('v1.0');
You can then check the version number stored in the database matches the application code during your app's setup or wherever is convenient.

SQL Compare by Red Gate.

Which RDBMS is this, and how complex are the potential changes?
Maybe this is just a matter of comparing row counts and index counts for each table -- if you have trigger and stored procedure versions to worry about also then you need something more industrial

Try dbForge Data Compare for SQL Server. It can compare and sync any databases, even very large ones. Quick, easy, always delivers a correct result.
Try it on your database and comment upon the product.
We can recommend you a reliable SQL comparison tool that offer 3 time’s faster comparison and synchronization of table data in your SQL Server databases. It's dbForge Data Compare for SQL Server.
Main advantages:
Speedier comparison and synchronization of large databases
Support of native SQL Server backups
Custom mapping of tables, columns, and schemas
Multiple options to tune your comparison and synchronization
Generating comparison and synchronization reports
Plus free 30-day trial and risk-free purchase with 30-day money back guarantee.

Related

compare data in sql server databases

I have a sql sever database on 2 servers. The structure of it is the same on both. A problem that I have is that I want to copy data between both databases - but the problem is I need to drop and recreate all the constraints first.
Any quick and easy way to script the differences between both databases, regarding data?
Yes, stop spending hours and hours trying to write a script that does this. Use a tried and true tool that handles all of that effort and debugging for you:
http://www.red-gate.com/products/sql-development/sql-data-compare/
There is a trial edition and there are several alternatives as well. Read this to see why you shouldn't re-invent the wheel:
http://madelinebertrand.com/2012/04/20/re-blog-the-cost-of-reinventing-the-wheel/
Just throwing my 2 cents in. If you have Visual Studio 2010 Premium or Ultimate, you can actually use feature called "Data Compare" to compare data between two databases. And it will be able to generate update script for target database as well.
I can only repeat the same opinion as Aaron Bertrand has, and to add to that, I had success using XSQL for this kind of task.
As far as I remember, it was a nice, consistent tool to use...
First you would need to turn the constraints off by altering the table in question such as:
alter table [table name] nocheck constraint all
Then you could query from the other server by linking or query directly using the following format:
select [cols] from [local table], [remote server.remote DB.remote table]

Querying for tables & columns named as keywords

Assume SQL Server 2005+.
Part A:
What is the canonical way to query from the system/internal/meta/whatever tables/views (sorry, not a database ninja) for any user table or column names that use SQL Server keywords (like case)?
I don't mind maintaining the list of keywords if that's not query-able, as it only changes with versions of SQL Server supported (right?).
Looking at available views in SQL Server 2005, I can easily enough query this information from INFORMATION_SCHEMA.COLUMNS and INFORMATION_SCHEMA.TABLES, but I want to be sure it's from the best possible location for future-proofing.
Part B:
Is it possible to get the list of keywords via query?
UPDATE: While a useful concept, I'm specifically not interested in escaping the column/table/etc names in question because I'm hoping to write a tool that will check for tables/columns/etc that share names with keywords and provide useful warnings to developers. The tool would be used during code reviews at my office to point out that the developer might want to consider renaming the entity. (Or hopefully by the developer before code reviews for their own good!) I may even set it up for use with continuous integration in my build scripts, but that's only a thought for the future.
You should properly quote the names used. If you generate code, use the built-in QUOTENAME function. Don't build a list of known keywords, instead quote every name used for every object, including database name, schema name and object name. Also make sure you always adhere to the correct case of the objects involved. As a best practice, develop on a case sensitive collation server instance. Developing code on case insensitive server collation (default) can lead to embarasing failures on production when deployed on case sensitive collation servers.
For Part A
Personally I would go for sys.columns and sys.objects actually. INFORMATION_SCHEMA views are also good, and they're 'portable' in theory, I'm just so much more used to the SQL specific ones though. I choose sys.objects vs. sys.tables because it covers more (eg. views). I would suggest you also cover table valued functions, table valued parameter types (in 2008 only) and temporary #tables and table #variables declared inside stored procedures. That would leave out only temp #tables and table #variables declared in batches sent by clients, but those are basically in client code only.
A: Just use brackets around your identifier.
select [procedure].[case] from [procedure]
B: I'm not sure if you can query for them, but there is a MSDN page about it.
If you need these programmatically, I suggest you insert them all into a table for your own uses.
Why do you need to know the list of keywords? a: they don't change very often, and b: for any regular code (I'm excluding things like "sql server management studio") you can just use square brackets:
SELECT [table].[column], [table].[join]
FROM [table]

How do I create and synchronize a combined reporting-only db from two live dbs?

I need to quickly implement a read-only database containing data pulled from two identically structured live databases.
The live dbs are actually company dbs from a Dynamics accounting system so I'm happy for any Dynamics specific advice but this is mostly a SQL question. It's a fairly old version of Dynamics from before Great Plains was acquired by Microsoft. This is on SQL Server 2000.
We have reports and applications which access the Dynamics data. These apps are designed to look at one company db. Now we need to add another. It's appropriate that most of these reports and apps see combined data. They don't really care which company an order or invoice exists in. They only look at a small number of the tables.
It seems to me that the simplest solution is to create a reports only db with combined data. Preferably, we need an efficient way to update this db with changes several times a day.
I'm a developer, not a db expert but here's my plan:
Create the combined reporting db with the required tables initially with the same table structure as the live dbs.
All Dynamics tables seem to have an int identity column called DEX_ROW_ID. I'm not sure what it's used for, (it's not indexed) but that seems like the obvious generic way to uniquely identify rows. On the reporting db I will change it to a normal int (not an identity). I will create a unique index on DEX_ROW_ID in all dbs.
Dynamics does not have timestamps so I will add a timestamp column to tables in the live dbs and a corresponding binary(8) column in the reporting db. I'm assuming and hoping that Dynamics won't be upset by the additional index and column.
Add an int CompanyId column to the reporting db tables and add it to the end of any unique indexes. Most data will be naturally unique even without that. ie, order and invoice numbers etc will be different for the two live dbs. We may need to make some minor changes to the applications but I'm not expecting to do much other than point them to the new reporting db.
Assuming my reporting db is called Reports, the live dbs are Live1 and Live2, the timestamp column is called TS and all dbs are on the same server ... here's my first attempt at an update script for copying the changes in one table called MyTable in Live1 to the reporting db.
USE Reports
CREATE TABLE #Changes
(
ReportId int,
LiveId int
)
/* Collect in a temp table the ids or rows which have been deleted or changed
in the live db L.DEX_ROW_ID will be null if the row has been deleted */
INSERT INTO #Changes
SELECT R.DEX_ROW_ID, L.DEX_ROW_ID
FROM MyTable R LEFT OUTER JOIN Live1.dbo.MyTable L ON L.DEX_ROW_ID = R.DEX_ROW_ID
WHERE R.CompanyId = 1 AND L.DEX_ROW_ID IS NULL OR L.TS <> R.TS
/* Delete rows that have been deleted or changed on the live db
I wonder if using join syntax would run better than the subquery. */
DELETE FROM MyTable
WHERE CompanyId = 1 AND DEX_ROW_ID IN (SELECT ReportId FROM #Changes)
/* Recopy rows that have changed in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable L
WHERE L.DEX_ROW_ID IN (SELECT ReportId FROM #Changes WHERE LiveId IS NOT NULL)
/* Copy the rows that are new in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable
WHERE DEX_ROW_ID > (SELECT MAX(DEX_ROW_ID) FROM MyTable WHERE CompanyId = 1)
Then do the same for the Live2 db. Repeat for every table in Reports. I know I should use a parameter #CompanyId instead of the literal but I can't do that for the live db name some I might generate these dynamically with a C# program or something.
I'm looking for any advice, suggestions or critique on what I'm doing here. I know it won't be atomic. Things could be happening on the live db while this script runs. I think we can live with that. We'll probably do a full copy either nightly or weekly when nothing is happening on the live dbs.
We need to favor performance over elegance or perfection. Some initial testing has the first query with the TS comparisons running at about 30 seconds for the biggest table so I'm optimistic that this is going to work but I'd also like to know if I'm missing something obvious or not seeing the forest for the trees.
We don't really want to deal with log files on the reporting db. Can we just set that to simple recovery model and forget about logs?
Thanks
I think there are a couple open questions here.
Do you need these reports to be near-real-time? Or is this this sort of reporting that could live with daily updates? But assume you need up-to-the-minute data.
Have you considered querying the databases directly and merging the data per-report on the fly? You'll have to do a lot of reporting to duplicate the effort that's going to go into designing, creating, and supporting a real-time merged replicated database.
Thirty seconds is (IMHO) unacceptable for any single query against a production database. There could be any number of tuning-related reasons for taking this long, but it at least means you're going to need serious professional SQL Server optimization resources (i.e. people). And if this is a problem for the queries for reports, it doesn't bode well for the queries to maintain a separate database for reporting.
Tuck into the back of your mind the consideration that, if you need to consolidate to a single database, it's worth considering whether you should make it an OLAP database rather than a mirror. The mirror will be quicker and easier, but the OLAP would be far more flexible and powerful in the long term; and it might be well to go the whole way from the beginning.
The last thing I'd want to do is write a custom update script. Try these bulletproof methods first:
Let's hope your production databases are backed up. Restore those backups every night to the reporting server. You can automate restores with the RESTORE command, which will work with a file on a network server.
Use SQL Server replication to push data from the live servers to the backend.
Schedule a DTS package every night to import the entire production database.
This might seem like brute force. But since you're copying a 2000-era database, brute force cannot be a problem with today's hardware. As an added advantage, these methods can be supported by a sysadmin instead of a developer.
Method 1 has the added added advantage of serving as backup verification. :)

best way in producing a master script for SQL

i want to extract specific database tables & stored procedures into one master script. Do you know any software that can help me do this faster? I've tried using the SQL Database publishing tool, but it's not that efficient since its gathering tables that I didn't select.
In SQL Server 2005, right click on the database, then select Tasks, and then select Generate Scripts.
Generating SQL Scripts in SQL Server 2005
As mentioned in that link, I'm fairly sure you have to generate the DROP and CREATE statements separately.
Try DBSourceTools. http://dbsourcetools.codeplex.com
Its open source, and specifically designed to script databases - tables, views, procs to disk.
It also allows you to select which tables, views, db-objects to script.
I use Redgate SQL compare for this (by comparing to an empty DB), as well as for doing upgrades between all my DB versions (I save a copy of the DB for each released version, and then just do a compare between current and previous to get a change script for that version).
I have found the "Generate Scripts" does a bad job in some cases with dependencies - eg, it will try to create a stored procedure that uses a table before the table is created, causing the script to fail. I'll accept I'm possibly using it wrong, but SQL Compare "just works". The scripts it generates are also enclosed in a transaction -- so if something fails, the whole change is rolled back. You don't end up with a half-populated or half-upgraded database.
Downside is that this is a commercial tool, but IMHO worth the money.

SQL Server Database Development Standards

I have to develop database development standards for our organisation for SQL Server and any code that interfaces to it. The code used can be anything from .NET code to VBScript to SQL Server Jobs.
Does anyone have a good link for this kind of thing?
My quick list is follows:
1) Naming Conventions
-- Stored Procedures usp_AppName_SPName
-- Functions usf_AppName_SPName
-- Indexes IX_TableName_IndexName
-- Tables AppName_TableName
-- Views VW_Name
2) Allocation of permissions to roles, never directly to users or groups
3) Allocation of roles to groups, never directly to users
4) Use of minimal permissions
5) No inline sql in code, always use SP or Functions
6) Use of explicit transactions
7) Readonly transactions where applicable
8) Always use explain plans to ensure sql is performant.
What other things do we need to cover? I am sure that there are lots of things....
Since we are talking best-practices I'd throw in a few things to avoid:
avoid use of xp_cmdshell
avoid dynamic sql unless strictly
necessary (such as for dynamic pivoting)
avoid cursors (if not on temp
tables)
P.S. Btw - I am doing all of the above ;)
I found the following quite useful:
http://www.ssw.com.au/ssw/Standards/Rules/RulesToBetterSQLServerDatabases.aspx
http://www.codeproject.com/KB/database/sqldodont.aspx
Also consider using multiple schemas. Use AppName.TableName instead of AppName_TableName, where AppName is a schema. The AdventureWorks sample does this, for instance.
I have to take issue with your first item right off the bat. While I know a lot of people like to use prefixes for stored procedures, tables, and the like, I've never had much use for that convention. When you start to get a lot of stored procedures that all start with "usp_", and you click to the expand the "Programmability\Stored Procedures" folder in Management Studio, it can be rather unwieldly to navigate.
Instead, require a prefix to match the logical feature set/functional group. What those prefixes are will vary by application or database. Then if you want to distinguish a stored procedure from a table, add your "_usp" requirement as a suffix.
For tables: you want something in your naming convention to distinguish between Application data (lookup tables) and User data.
Aren't roles and groups the same thing in SQL Server?
A few others...
Avoid using UDFs in WHERE clauses
Disallow direct SQL in applications
(always use SPs)
Use comment blocks in front of
views/procs/functions including a
revision history and/or revision
date
Use ANSI join syntax
Limit use of triggers, especially
for replicated tables