Normalization of an existing SQL database

Normalization of an existing SQL database - sql

I have a single-table database I inherited and migrated to SQL Server, and then I normalized it by creating, linking, and filling a whole bunch of lookup-type tables that represented items in the main table. I now want to replace those items in the original table with their foreign keys. Am I stuck writing a bunch of queries or UDF's and then a giant INSERT statement to accomplish this, or is there a tool I can use to point at the various fields and have it handle the grunt work for me?

Redgate SQL Refactor comes with a 14 day evaluation period and has a "Split Table" refactoring which sounds like it might do what you need?
The feature is described thus:
Split Table splits a table into two
tables, and automatically rewrites the
referencing stored procedures, views,
and so on. You can also use this
refactoring to introduce referential
integrity tables. You can select this
feature from the context menu in
Management Studio’s Object Explorer.

I have had similar experiences. I once inherited a fairly large database that required serious overhaul to the schema before I would look at it without scorn.
Because the upgrade was fairly significant, I designed an SSIS package to migrate data from the old schema to the new. Lookup activities were helpful to map old text values to the new keys. I kept a script of my old schema and data handy and would repeatedly restore the database in a sandbox and re-migrate until I could satisfy the powers-that-be that the migration was reliable.
I found there was only a moderate learning curve to getting started with SSIS. If the tool is available to you, I recommend giving it a try.

Related

Why would you create an SQL table via a query?

In most tutorials on database design, you are shown to create and manipulate tables via queries. Sorry for a newbie question but when using SQL Server Management Studio, why would you create a table using a query and not just using the built-in functions to create tables and add attributes to them? (eg: right-click\create table, go to design view and add columns and specify domains, indexes, keys etc...)

In any development, multiple environments are used. Development environment is used at coding stage, then QA, then Model Office/ UAT/ Production.
Using scripts ensures that changes can be promoted automatically. It also ensures that manual errors are either eliminated or kept to a minimum.
Hand coding in each environment will be expensive and error prone. Scripts make it possible to have same table structure.

I create tables using queries (and i store them in .sql files) because that way i can re-run them at later time to recreate the full database structure.
This sounds more useful while in a development/testing environment than it can be in productive, where i guess you wouldn't drop and re-create the entire database that often.

To add a reason not already mentioned - it allows the scripts to be audited / reviewed and potentially stored in a version controller or issue tracking system. This will be necessary in complex or secure scenarios especially in a fast-changing environment.

It looks more professional to write queries in tutorials :). In real life, it's simpler to alter a table through UI, but then again, you forget the SQL syntax that way. If you're not a Database Admin, it's not that important to know SQL syntax from a-z, in my opinion.

Database schemas WAY out of sync - need to get up to date without losing data

The problem: we have one application that has a portion which is used by a very small subset of the total users, and that part of the application is running off of a separate database as well. In a perfect world, the schemas of the two databases would be synced up, but such is not the case. Some migrations have been run on the smaller database, most haven't; and furthermore, there is nothing such as revision number to be able to easily identify which have and which haven't. We would like to solve this quandary for future projects. During a discussion we've come up with the following possible plan of action, and I am wondering if anyone knows of any project which has already solved this problem:
What we would like to do is create an empty database from the schema of the large fully-migrated database, and then move all of the data from the smaller non-migrated database into that empty one. If it makes things easier, it can probably be assumed for the sake of this problem specifically that no migrations have ever removed anything, only added.
Else, if there are other known solutions, I'd like to hear them as well.

You could use a schema comparison tool like Red-Gate's SQL Compare. You can synchronize the changes and not lose any data. I wrote about this and many alternative tools ranging widely in price here:
http://bertrandaaron.wordpress.com/2012/04/20/re-blog-the-cost-of-reinventing-the-wheel/
The nice thing is that most tools have trial versions. So, you can try them our for 14 days (fully functional) and only buy it if it meets your expectations. I can't speak for the other tools, but I've been using RG for years and it is a very capable and reliable tool.
(Updated 2012-06-23 to help prevent link-rot.)

Red-Gate's SQL Compare as Aaron Bertrand mentions in his answer is a very good option. However, if you are not permitted to purchase something, an option is to try something like:
1) For each database, script out all the tables, constraints, indexes, views, procedures, etc.
2) run a DIFF, and go through all the differences and make sure that the small DB can accept them. If not implement any changes (including data) necessary onto the small DB so it can accept the changes.
3) create a new empty database from the schema of the large DB
4) import the data from the small DB into the nee DB.

You could also reverse engineer your database into Visual Studio as a database project. Visual Studio Team Suite Database Edition GDR R2 (I know long name) has the capability to do a schema comparison and data comparison, but the beauty of this approach is that you get all of your database into a nice database project where you can manage change and integrate with source control. This would allow you to build from a common source and deploy consistent changes.

How is Database Migration done?

i remember in my previous job, i needed to do data migration. in that case, i needed to migrate to a new system, i was to develop, so it has a different table schema. i think 1st, i should know:
in general, how is data migrated (with the same schema) to a different DB engine. eg. MySQL -> MSSQL. in my case, my destination DB was MySQL and i used MySQL Migration Toolkit
i am thinking, in an enterprise app, there may be stored procedures, triggers that also need to be imported.
if table schema is different, how will i then go abt doing this? in my prev job, what i did was import data (in my case, from Access) into my destination (MySQL) leaving table structures. then use SQL to select data and manipulate as required into final destination tables.
in my case, where i dont have documentation for the old db, and the columns was not named correctly, eg. it uses say 'field1', 'field2' etc. i needed to trace from the application code what the columns mean. any better way? or sometimes, columns contain multiple values in delimited data, is reading code the only way?

I really depends, but from your question I assume you want to hear what other people do.
So here is what I do in my current project.
I have to migrate from Oracle to Oracle but to a completely different schema.
The old system was 2-tier (old client, old database) the new system is 3-tier (new client, business logic, new database). We have more than 600 tables in the new schema.
After much pondering we scraped the idea of doing a migration from old database to new database in SQL. We decided that in our case i would be much easier to go:
old database -> old client -> business logic -> new database
In the old database much of the data is stored in strange ways and the old client
mangles it in complex ways. We have access to the source code of the old client but it is a very large system.
We wrote a migration tool that sits above the old client and the business logic.
We have some SQL before and some SQL after that but the bulk of data is migrated via
old client and business logic.
The downside is that it is slow, a complete migration taking more than 190 hours in our case but otherwise it works well.
UPDATE
As far as stored procedures and triggers are concerned:
Even as we use the same DBMS in old and new system (both Oracle) the procedures and
triggers are written from scratch for the new system.

When I've performed database migrations, I've used the application instead a general tool to migrate the database. The application connects to two databases and copies objects from one to the other. You don't have to worry about schema or permissions or whatnot since all that is handled in the application, just like what happens when you set up the application in the first place.
Of course, this may not help you if your application doesn't support this. But if you're writing an application, I strongly recommend doing it this way.

I recommend the wikipedia article for a good overview and links to the main commercial tools (and some non-commercial ones). Stored procedures (and kin, e.g. user-defined function), if abundant, are going to be the "hot spots" in the migration, requiring rare abd costly human skills -- as soon as you get away from the "declarative" mood of mainstream SQL, and into procedural code, you cannot expect automated tools to do a decent job (Turing's Theorem says that they actually can't, in a sufficiently general case;-). So, you need engineers with a good understanding of the procedural trappings of BOTH engines -- the one you're migrating from, the one you're migrating to. You can buy that -- it's one of the niches where consultants make REALLY good money!-)

If you are using MS SQL Server, you can use SSMS to script out the schema and all data in one go: SQL Server 2008: Script Data as Inserts.
If you are not using any/many non-standard SQL constructs, then you might be able to manually edit this scipt without too much effort.

Effectively transforming data from one SQL database to another in live environment

We have a bit of a messy database situation.
Our main back-office system is written in Visual Fox Pro with local data (yes, I know!)
In order to effectively work with the data in our websites, we have chosen to regularly export data to a SQL database. However the process that does this basically clears out the tables each time and does a re-insert.
This means we have two SQL databases - one that our FoxPro export process writes to, and another that our websites read from.
This question is concerned with the transform from one SQL database to the other (SqlFoxProData -> SqlWebData).
For a particular table (one of our main application tables), because various data transformations take places during this process, it's not a straightforward UPDATE, INSERT and DELETE statements using self-joins, but we're having to use cursors instead (I know!)
This has been working fine for many months but now we are starting to hit upon performance problems when an update is taking place (this can happen regularly during the day)
Basically when we are updating SqlWebData.ImportantTable from SqlFoxProData.ImportantTable, it's causing occasional connection timeouts/deadlocks/other problems on the live websites.
I've worked hard at optimising queries, caching etc etc, but it's come to a point where I'm looking for another strategy to update the data.
One idea that has come to mind is to have two copies of ImportantTable (A and B), some concept of which table is currently 'active', updating the non-active table, then switching the currenly actice table
i.e. websites read from ImportantTableA whilst we're updating ImportantTableB, then we switch websites to read from ImportantTableB.
Question is, is this feasible and a good idea? I have done something like it before but I'm not convinced it's necessarily good for optimisation/indexing etc.
Any suggestions welcome, I know this is a messy situation... and the long term goal would be to get our FoxPro application pointing to SQL.
(We're using SQL 2005 if it helps)
I should add that data consistency isn't particularly important in the instance, seeing as the data is always slightly out of date

There are a lot of ways to skin this cat.
I would attack the locking issues first. It is extremely rare that I would use CURSORS, and I think improving the performance and locking behavior there might resolve a lot of your issues.
I expect that I would solve it by using two separate staging tables. One for the FoxPro export in SQL and one transformed into the final format in SQL side-by-side. Then either swapping the final for production using sp_rename, or simply using 3 INSERT/UPDATE/DELETE transactions to apply all changes from the final table to production. Either way, there is going to be some locking there, but how big are we talking about?

You should be able to maintain one db for the website and just replicate to that table from the other sql db table.
This is assuming that you do not update any data from the website itself.

"For a particular table (one of our main application tables), because various data transformations take places during this process, it's not a straightforward UPDATE, INSERT and DELETE statements using self-joins, but we're having to use cursors instead (I know!)"
I cannot think of a case where I would ever need to perform an insert, update or delete using a cursor. If you can write the select for the cursor, you can convert it into an insert, update or delete. You can join to other tables in these statements and use the case stament for conditional processing. Taking the time to do this in a set -based fashion may solve your problem.
One thing you may consider if you have lots of data to move. We occassionally create a view to the data we want and then have two tables - one active and one that data will be loaded into. When the data is finsihed loading, as part of your process run a simple command to switch the table the view uses to the one you just finshed loading to. That way the users are only down for a couple of seconds at most. You won't create locking issues where they are trying to access data as you are loading.
You might also look at using SSIS to move the data.

Do you have the option of making the updates more atomic, rather than the stated 'clear out and re-insert'? I think Visual Fox Pro supports triggers, right? For your key tables, can you add a trigger to the update/insert/delete to capture the ID of records that change, then move (or delete) just those records?
Or how about writing all changes to an offline database, and letting SQL Server replication take care of the sync?
[sorry, this would have been a comment, if I had enough reputation!]

Based on your response to Ernie above, you asked how you replicate databases. Here is Microsoft's how-to about replication in SQL2005.
However, if you're asking about replication and how to do it, it indicates to me that you are a little light in experience for SQL server. That being said, it's fairly easy to muck things up and while I'm all for learning by experience, if this is mission critical data, you might be better off hiring a DBA or at the very least, testing the #$##$% out of this before you actually implement it.

How should I organize my master ddl script

I am currently creating a master ddl for our database. Historically we have used backup/restore to version our database, and not maintained any ddl scripts. The schema is quite large.
My current thinking:
Break script into parts (possibly in separate scripts):
table creation
add indexes
add triggers
add constraints
Each script would get called by the master script.
I might need a script to drop constraints temporarily for testing
There may be orphaned tables in the schema, I plan to identify suspect tables.
Any other advice?
Edit: Also if anyone knows good tools to automate part of the process, we're using MS SQL 2000 (old, I know).

I think the basic idea is good.
The nice thing about building all the tables first and then building all the constraints, is that the tables can be created in any order. When I've done this I had one file per table, which I put in a directory called "Tables" and then a script which executed all the files in that directory. Likewise I had a folder for constraint scripts (which did foreign key and indexes too), which were executed when after the tables were built.
I would separate the build of the triggers and stored procedures, and run these last. The point about these is they can be run and re-run on the database without affecting the data. This means you can treat them just like ordinary code. You should include "if exists...drop" statements at the beginning of each trigger and procedure script, to make them re-runnable.
So the order would be
table creation
add indexes
add constraints
Then
add triggers
add stored procedures
On my current project we are using MSBuild to run the scripts. There are some extension targets that you can get for it which allow you to call sql scripts. In the past I have used perl which was fine too (and batch files...which I would not recommend - the're too limited).

#Adam
Or how about just by domain -- a useful grouping of related tables in the same file, but separate from the rest?
Only problem is if some domains (in this somewhat legacy system) are tightly coupled. Plus you have to maintain the dependencies between your different sub-scripts.

If you are looking for an automation tool, I have often worked with EMS SQLManager, which allows you to generate automatically a ddl script from a database.
Data inserts in reference tables might be mandatory before putting your database on line. This can even be considered as part of the ddl script. EMS can also generate scripts for data inserts from existing databases.
Need for indexes might not be properly estimated at the ddl stage. You will just need to declare them for primary/foreign keys. Other indexes should be created later, once views and queries have been defined

What you have there seems to be pretty good. My company has on occasion, for large enough databases, broken it down even further, perhaps to the individual object level. In this way each table/index/... has its own file. Can be useful, can be overkill. Really depends on how you are using it.
#Justin
By domain is mostly always sufficient. I agree that there are some complexities to deal with when doing it this way, but that should be easy enough to handle.
I think this method provides a little more seperation (which in a large database you will come to appreciate) while still making itself pretty manageable. We also write Perl scripts that do a lot of the processing of these DDL files, so that might be an option of a good way to handle that.

there is a neat tools that will iterate through the entire sql server and extract all the table, view, stored proceedures and UDF defintions to the local file system as SQL scripts (Text Files). I have used this with 2005 and 2008, not sure how it wil work with 2000 though. Check out http://www.antipodeansoftware.com/Home/Products

Invest the time to write a generic "drop all constraints" script, so you don't have to maintain it.
A cursor over the following statements does the trick.
Select * From Information_Schema.Table_Constraints
Select * From Information_Schema.Referential_Constraints

I previously organised my DDL code organised by one file per entity and made a tool that combined this into a single DDL script.
My former employer used a scheme where all table DDL was in one file (stored in oracle syntax), indicies in another, constraints in a third and static data in a fourth. A change script was kept in paralell with this (again in Oracle). The conversion to SQL was manual. It was a mess. I actually wrote a handy tool that will convert Oracle DDL to SQL Server (it worked 99.9% of the time).
I have recently switched to using Visual Studio Team System for Database professionals. So far it works fine, but there are some glitches if you use CLR functions within the database.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas