Defining table structure for a database? - sql

Up until now, my experience with databases has always been working with an intermediate definition layer that we have where I work. i.e. SQL wasn't directly written for the table definitions, but generated from an intermediate file which wrote out SQL scripts for creating the appropriate tables, upgrade scripts between schema changes, and helper functions for doing simple queries/updates/inserts/deletes from the database.
Now I'm in a situation where I don't have access to that, for reasons I won't get into, and I find myself somewhat lost at sea regarding what to do. I need to have a small number of tables in a database, and I'm unsure what's usually done to manage the table definitions.
Do people normally just use the SQL script that does the table creation as their definition, or does everyone just use an IDE that manages the definition in a separate file and regenerates the SQL script to create the tables?
I'd really prefer not to have to introduce a dependence on a specific IDE, because as we all know, developers are whiners that are prone to religious debates over small things.

Open your favorite text editor -> Start writing CREATE scripts -> Save -> Put in Source Control
That script now becomes the basis for you database. Anytime there are schema changes, they get put back into the scripts so that they don't get lost.
These become your definition.
I find it more reliable than depending on any specific IDE/Platform generating those scripts for you.

We write the scripts ourselves and store them in source control like any other code. Then the scripts that are appropriate for a particular version are all groupd together and promoted to prod together. Make sure to use alter table when changing existing tables becasue you don't want to drop and recreate them if they have data! I use a drop and recireate for all other objects though. If you need to add records to a particular table (usually a lookup of sometype) we do that in scripts as well. Then that too gets promoted with the rest of the version code.
For me, putting the scripts in source control however they are generated is the key step. This is how you know what you have changed for the next release. This is how you can see earlier versions and revert back easily if there is a problem. Treat database code the same wayyou treat all other code.

YOu could use one of the data modelling tools that creates scripts for you if you are starting out on a database design and the eventually want to create it for you. Some tools for that are Erwin, Fabforce etc... (though not free)
If you have access to an IDE like SQL Management studio, you can create them by using an GUI thats pretty simple.
If you are writing your own code, Its always better to write your own scripts based on a good template so that you cover all the properties of the definition of the table like the file_group, Collation & stuff. Hope this helps
Once you do create a base copy generate scripts and have a base reference copy of it so that you could do "incremental" changes on them and manage them in a source control.

Though I use TOAD for Oracle, I always write the scripts to create my database objects by hand. It gives you (and your DBA's) more control and knowledge of what's being created and how.

If your schema is too difficult to describe in SQL, you probably have other issues more pressing than which IDE. Use modelling documentation if you need a graphical representation, but yeah, you don't need an IDE.

There are multiple ways out there for what you are asking.
Old traditional way is to have a script file ready with your application that has CREATE TABLE statement.
If you are a developer, and that too a Java enterprise developer, you could generate complete schema using a persistence library called Hibernate. Here is a how to
If you are a DBA level user, you could take schema export from one environment and import that in to your current environment. This is a standard practice among DBAs. But it requires admin access as you can see. Also, the methods are dependent on the database you are using (oracle, db2 etc)

Related

Why would you create an SQL table via a query?

In most tutorials on database design, you are shown to create and manipulate tables via queries. Sorry for a newbie question but when using SQL Server Management Studio, why would you create a table using a query and not just using the built-in functions to create tables and add attributes to them? (eg: right-click\create table, go to design view and add columns and specify domains, indexes, keys etc...)
In any development, multiple environments are used. Development environment is used at coding stage, then QA, then Model Office/ UAT/ Production.
Using scripts ensures that changes can be promoted automatically. It also ensures that manual errors are either eliminated or kept to a minimum.
Hand coding in each environment will be expensive and error prone. Scripts make it possible to have same table structure.
I create tables using queries (and i store them in .sql files) because that way i can re-run them at later time to recreate the full database structure.
This sounds more useful while in a development/testing environment than it can be in productive, where i guess you wouldn't drop and re-create the entire database that often.
To add a reason not already mentioned - it allows the scripts to be audited / reviewed and potentially stored in a version controller or issue tracking system. This will be necessary in complex or secure scenarios especially in a fast-changing environment.
It looks more professional to write queries in tutorials :). In real life, it's simpler to alter a table through UI, but then again, you forget the SQL syntax that way. If you're not a Database Admin, it's not that important to know SQL syntax from a-z, in my opinion.

Getting a significant amount of data into a SQL Server (Express) database at time of deployment

For most database-backed projects I've worked on, there is a need to get "startup" or test data into the database before deploying the project. Examples of startup data: a table that lists all the countries in the world or a table that lists a bunch of colors that will be used to populate a color palette.
I've been using a system where I store all my startup data in an Excel spreadsheet (with one table per worksheet), then I have a utility script in SQL that (1) creates the database, (2) creates the schemas, (3) creates the tables (including primary and foreign keys), (4) connects to the spreadsheet as a linked server, and (5) inserts all the data into the tables.
I mostly like this system. I find it very easy to lay out columns in Excel, verify foreign key relationships using simple lookup functions, perform concatenation operations, copy in data from web tables or other spreadsheets, etc. One major disadvantage of this system is the need to sync up the columns in my worksheets any time I change a table definition.
I've been going through some tutorials to learn new .NET technologies or design patterns, and I've noticed that these typically involve using Visual Studio to create the database and add tables (rather than scripts), and the data is typically entered using the built-in designer. This has me wondering if maybe the way I'm doing it is not the most efficient or maintainable.
Questions
In general, do you find it preferable to build your whole database via scripts or a GUI designer, such as SSMSE or Visual Studio?
What method do you recommend for populating your database with startup or test data and why?
Clarification
Judging by the answers so far, I think I should clarify something. Assume that I have a significant amount of data (hundreds or thousands of rows) that needs to find its way into the database. This data could be sourced from various places, such as text files, spreadsheets, web tables, etc. I've received several suggestions to script this process using INSERT statements, but is this really viable when you're talking about a lot of data?
Which leads me to...
New questions
How would you write a SQL script to take the country data on this page and insert it into the database?
With Excel, I could just copy/paste the table into a worksheet and run my utility script, and I'd basically be done.
What if you later realized you needed a new column, CapitalCity?
With Excel, I could take that information from this page, paste it into Excel, and with a quick text-to-column manipulation, I'd have the data in the format I need.
I honestly didn't write this question to defend Excel as the best way or even a good way to get data into a database, but the answers so far don't seem to be addressing my main concern--how to get all this data into your database. Writing a script with hundreds of INSERT statements by hand would be extremely time consuming and error prone. Somehow, this script needs to be machine generated, but how?
I think your current process is fine for seeding the database with initial data. It's simple, easy to maintain, and works for you. If you've got a good database design with adequate constraints then it doesn't really matter how you seed the initial data. You could use an intermediate tool to generate scripts but why bother?
SSIS has a steep learning curve, doesn't work well with source control (impossible to tell what changed between versions), and is very finicky about type conversions from Excel. There's also an issue with how many rows it reads ahead to determine the data type -- you're in deep trouble if your first x rows contain numbers stored as text.
1) I prefer to use scripts for several reasons.
• Scripts are easy to modify, and plus when I get ready to deploy my application to a production environment, I already have the scripts written so I'm all set.
• If I need to deploy my database to a different platform (like Oracle or MySQL) then it's easy to make minor modifications to the scripts to work on the target database.
• With scripts, I'm not dependent on a tool like Visual Studio to build and maintain the database.
2) I like good old fashioned insert statements using a script. Again, at deployment time scripts are your best friend. At our shop, when we deploy our applications we have to have scripts ready for the DBA's to run, as that's what they expect.
I just find that scripts are simple, easy to maintain, and the "least common denominator" when it comes to creating a database and loading up data to it. By least common denominator, I mean that the majority of people (i.e. DBA's, other people in your shop that might not have visual studio) will be able to use them without any trouble.
The other thing that's important with scripts is that it forces you to learn SQL and more specfically DDL (data definition language). While the hand-holding GUI tools are nice, there's no substitute for taking the time to learn SQL and DDL inside out. I've found that those skills are invaluable to have in almost any shop.
Frankly, I find the concept of using Excel here a bit scary. It obviously works, but it's creating a dependency on an ad-hoc data source that won't be resolved until much later. Last thing you want is to be in a mad rush to deploy a database and find out that the Excel file is mangled, or worse, missing entirely. I suppose the severity of this would vary from company to company as a function of risk tolerance, but I would be actively seeking to remove Excel from the equation, or at least remove it as a permanent fixture.
I always use scripts to create databases, because scripts are portable and repeatable - you can use (almost) the same script to create a development database, a QA database, a UAT database, and a production database. For this reason it's equally important to use scripts to modify existing databases.
I also always use a script to create bootstrap data (AKA startup data), and there's a very important reason for this: there's usually more scripting to be done afterward. Or at least there should be. Bootstrap data is almost invariably read-only, and as such, you should be placing it on a read-only filegroup to improve performance and prevent accidental changes. So you'll generally need to script the data first, then make the filegroup read-only.
On a more philosophical level, though, if this startup data is required for the database to work properly - and most of the time, it is - then you really ought to consider it part of the data definition itself, the metadata. For that reason, I don't think it's appropriate to have the data defined anywhere but in the same script or set of scripts that you use to create the database itself.
Test data is a little different, but in my experience you're usually trying to auto-generate that data in some fashion, which makes it even more important to use a script. You don't want to have to manually maintain an ad-hoc database of millions of rows for testing purposes.
If your problem is that the test or startup data comes from an external source - a web page, a CSV file, etc. - then I would handle this with an actual "configuration database." This way you don't have to validate references with VLOOKUPS as in Excel, you can actually enforce them.
Use SQL Server Integration Services (formerly DTS) to pull your external data from CSV, Excel, or wherever, into your configuration database - if you need to periodically refresh the data, you can save the SSIS package so it ends up being just a couple of clicks.
If you need to use Excel as an intermediary, i.e. to format or restructure some data from a web page, that's fine, but the important thing IMO is to get it out of Excel as soon as possible, and SSIS with a config database is an excellent repeatable method of doing that.
When you are ready to migrate the data from your configuration database into your application database, you can use SQL Server Management Studio to generate a script for the data (in case you don't already know - when you right click on the database, go to Tasks, Generate Scripts, and turn on "Script Data" in the Script Options). If you're really hardcore, you can actually script the scripting process, but I find that this usually takes less than a minute anyway.
It may sound like a lot of overhead, but in practice the effort is minimal. You set up your configuration database once, create an SSIS package once, and refresh the config data maybe once every few months or maybe never (this is the part you're already doing, and this part will become less work). Once that "setup" is out of the way, it's really just a few minutes to generate the script, which you can then use on all copies of the main database.
Since I use an object-relational mapper (Hibernate, there is also a .NET version), I prefer to generate such data in my programming language. The ORM then takes care of writing things into the database. I don't have to worry about changing column names in the data because I need to fix the mapping anyway. If refactoring is involved, it usually takes care of the startup/test data also.
Excel is an unnecessary component of this process.
Script the current version the database components that you want to reuse, and add the script to your source control system. When you need to make changes in the future, either modify the entities in the database and regenerate the script, or modify the script and regenerate the database.
Avoid mixing Visual Studio's db designer and Excel as they only add complexity. Scripts and SQL Management Studio are your friends.

How should I migrate DDL changes from one environment to the next?

I make DDL changes using SQL Developer's GUI. Problem is, I need to apply those same changes to the test environment. I'm wondering how others handle this issue. Currently I'm having to manually write ALTER statements to bring the test environment into alignment with the development environment, but this is prone to error (doing the same thing twice). In cases where there's no important data in the test environment I usually just blow everything away, export the DDL scripts from dev and run them from scratch in test.
I know there are triggers that can store each DDL change, but this is a heavily shared environment and I would like to avoid that if possible.
Maybe I should just write the DDL stuff manually rather than using the GUI?
I've seen a I-don't-know-how-many ways tried to handle this, and in end I think you need to just maintain manual scripts.
Now, you don't necessarily have to write then yourself. In MSSQL, as you're making a change, there is a little button to Generate Script, which will spit out a SQL script for the change you are making. I know you're talking about Oracle, and it's been a few years since I worked with their GUI, but I can only imagine that they have the same feature.
However, you can't get away from working with scripts manually. You're going to have a lot of issues around pre-existing data, like default values for new columns or how to handle data for a renamed/deleted/moved column. This is just part of the analysis in working with a database schema over time that you can't get away from. If you try to do this with an completely automated solution, your data is going to get messed up sooner or later.
The one thing I would recommend, just to make your life a little easier, is make sure you separate schema changes from code changes. The difference is that schema changes to tables and columns must be run exactly once and never again, and therefore have to be versioned as individual change scripts. However, code changes, like stored procs, functions, and even views, can (and should) be run over and over, and can be versioned just like any other code file. The best approach to this I've seen was when we had all of the procs/functions/views in VSS, and our build process would drop all and and recreate them during every update. This is the same idea as doing a rebuild of your C#/Java/whatever code, because it make sure everything is always up to date.
Here's a trigger I implemented to track DDL changes. Sources used:
http://www.dba-oracle.com/t_ddl_triggers.htm
http://www.orafaq.com/forum/t/68667/0/
CREATE OR REPLACE TRIGGER ddl_trig
AFTER create OR drop OR alter
ON scott.SCHEMA
DECLARE
li ora_name_list_t;
ddl_text clob;
BEGIN
for i in 1..ora_sql_txt(li) loop
ddl_text := ddl_text || li(i);
end loop;
INSERT INTO my_audit_tbl VALUES
(SYSDATE,
ORA_SYSEVENT,
ORA_DICT_OBJ_TYPE,
ORA_DICT_OBJ_NAME,
ddl_text
);
END;
/
Never use the GUI for such things. Write the scripts and put them into source control.
Database Change Management / Database Diff
Some tools for that are –
1) Oracle Change Management Pack
From the docs –
It allows us to take a baseline(snapshot) at a fixed time and then later we can see how the DB schema and objects have changed. The CMP can also generate DDL scripts, though I am not sure we would want to use it.
Details
http://download-east.oracle.com/docs/cd/B19306_01/em.102/b31949/change_management.htm
http://www.oracle.com/technology/products/oem/pdf/change-management-pack-11g-datasheet.pdf
2) PL/SQL Developer Compare User Objects feature
This is available from Tools -> Compare User Objects
3) Oracle SQL Developer Database Diff feature
This is available from Tools -> Database diff
http://www.oracle.com/technology/products/database/sql_developer/files/what_is_sqldev.html#copy See “Schema Copy and Compare”
#1 looks to be most versatile and flexible but DBA rights may be necessary.
#2 & 3 can be used by any developer. I think Oracle SQL Developer is easier and provides more options.
Using any of the above option can help in –
Identifying the changed objects and may also serve as a Check List before submission of MAC.
The developers concerned can take ownership of specific changed objects.
You can do this nicely with Toad.
You use the Compare Schemas function to find all the differences (it's very flexible; you can specify which object types to look at, and many other options). It will show you the differences, you can have a look and make sure it seems right, and then tell it to generate an update script for you. Voila. The only catch is, you need the DBA Module to generate the sync script, which is an extra cost. But I'd say it's worth it if you do this often. (Or if you can get hold of an older Toad version, pre-9.0 I think, there's a bug which allows you to extract the sync script without the DBA Module. :))
Toad isn't cheap, but having used it for years I consider it indispensable, and well worth the price for any Oracle developer or DBA.

Best Database Change Control Methodologies

As a database architect, developer, and consultant, there are many questions that can be answered. One, though I was asked recently and still can't answer good, is...
"What is one of, or some of, the best methods or techniques to keep database changes documented, organized, and yet able to roll out effectively either in a single-developer or multi-developer environment."
This may involve stored procedures and other object scripts, but especially schemas - from documentation, to the new physical update scripts, to rollout, and then full-circle. There are applications to make this happen, but require schema hooks and overhead. I would rather like to know about techniques used without a lot of extra third-party involvement.
The easiest way I have seen this done without the aid of an external tool is to create a "schema patch" if you will. The schema patch is just a simple t-sql script. The schema patch is given a version number within the script and this number is stored in a table in the database to receive the changes.
Any new changes to the database involve creating a new schema patch that you can then run in sequence which would then detect what version the database is currently on and run all schema patches in between. Afterwards the schema version table is updated with whatever date/time the patch was executed to store for the next run.
A good book that goes into details like this is called Refactoring Databases.
If you wish to use an external tool you can look at Ruby's Migrations project or a similar tool in C# called Migrator.NET. These tools work by creating c# classes/ruby classes with an "Forward" and "Backward" migration. These tools are more feature rich because they know how to go forward as well as backwards in the schema patches. As you stated however, you are not interested in an external tool, but I thought I would add that for other readers anyways.
I rather liked this series:
http://odetocode.com/Blogs/scott/archive/2008/02/03/11746.aspx
In my case I have a script generate every time I change the database, I named the script like 00001.sql, n.sql and I have a table with de number of last script I have execute. You can also see Database Documentation
as long as you add columns/tables to your database it will be an easy task by scripting these changes in advance in sql-files. you just execute them. maybe you have some order to execute them.
a good solution would be to make one file per table, so that all changes belonging to this table would be visible to who-ever is working on the table (its like working on a class). the same is valid for stored procedures or views.
a more difficult task (and therefore maybe tools would be good) is to step back. as long as you just added tables/columns maybe this would not be a big issue. but if you have dropped columns on an update, and now you have to undo your update, the data is not there anymore. you will need to get this data from the backup. but keep in mind, if you have more then a few tables this could be a big task, and in the normal case you should undo your update very fast!
if you could just restore the backup, then its fine in this moment. but, if you update on monday, your clients work till wednesday and then they see that some data is missing (which you just dropped out of a table) then you could not just restore the old database.
i have a model-based approach in my mind (sorry, not implemented at the moment) in which schema-changes are "modeled" (e.g. per xml) and during an update a processor (e.g. a c# program) creates all necessary "sql" and e.g. moves data to a "dropDatabase". the data can reside there, and if for some reason i need to restore some of the dropped data, i can just do it with the processor. i think over some time (years) this approach is not as bad because otherwise developers don't touch "old" tables because they don't know anymore if the table or column is really necessary. with this approach you don't risk too lot if you drop something!
What I do is:
All the DDL commands required to recreate the schema (and the stored procedures and the indexes, etc) are in a script.
To be sure the script is OK, it is tested from time to time (create a database, run the script and restore the backup and check the database works well).
For change control, the script is kept in a Version Control System (I typically use Subversion).
The trick is that, if the database cannot be brought down to recreate with, say, an added column, I have two changes to make, an ALTER TABLE + a modification in the script. A bit more work but, in the long term, it wins.

How should I organize my master ddl script

I am currently creating a master ddl for our database. Historically we have used backup/restore to version our database, and not maintained any ddl scripts. The schema is quite large.
My current thinking:
Break script into parts (possibly in separate scripts):
table creation
add indexes
add triggers
add constraints
Each script would get called by the master script.
I might need a script to drop constraints temporarily for testing
There may be orphaned tables in the schema, I plan to identify suspect tables.
Any other advice?
Edit: Also if anyone knows good tools to automate part of the process, we're using MS SQL 2000 (old, I know).
I think the basic idea is good.
The nice thing about building all the tables first and then building all the constraints, is that the tables can be created in any order. When I've done this I had one file per table, which I put in a directory called "Tables" and then a script which executed all the files in that directory. Likewise I had a folder for constraint scripts (which did foreign key and indexes too), which were executed when after the tables were built.
I would separate the build of the triggers and stored procedures, and run these last. The point about these is they can be run and re-run on the database without affecting the data. This means you can treat them just like ordinary code. You should include "if exists...drop" statements at the beginning of each trigger and procedure script, to make them re-runnable.
So the order would be
table creation
add indexes
add constraints
Then
add triggers
add stored procedures
On my current project we are using MSBuild to run the scripts. There are some extension targets that you can get for it which allow you to call sql scripts. In the past I have used perl which was fine too (and batch files...which I would not recommend - the're too limited).
#Adam
Or how about just by domain -- a useful grouping of related tables in the same file, but separate from the rest?
Only problem is if some domains (in this somewhat legacy system) are tightly coupled. Plus you have to maintain the dependencies between your different sub-scripts.
If you are looking for an automation tool, I have often worked with EMS SQLManager, which allows you to generate automatically a ddl script from a database.
Data inserts in reference tables might be mandatory before putting your database on line. This can even be considered as part of the ddl script. EMS can also generate scripts for data inserts from existing databases.
Need for indexes might not be properly estimated at the ddl stage. You will just need to declare them for primary/foreign keys. Other indexes should be created later, once views and queries have been defined
What you have there seems to be pretty good. My company has on occasion, for large enough databases, broken it down even further, perhaps to the individual object level. In this way each table/index/... has its own file. Can be useful, can be overkill. Really depends on how you are using it.
#Justin
By domain is mostly always sufficient. I agree that there are some complexities to deal with when doing it this way, but that should be easy enough to handle.
I think this method provides a little more seperation (which in a large database you will come to appreciate) while still making itself pretty manageable. We also write Perl scripts that do a lot of the processing of these DDL files, so that might be an option of a good way to handle that.
there is a neat tools that will iterate through the entire sql server and extract all the table, view, stored proceedures and UDF defintions to the local file system as SQL scripts (Text Files). I have used this with 2005 and 2008, not sure how it wil work with 2000 though. Check out http://www.antipodeansoftware.com/Home/Products
Invest the time to write a generic "drop all constraints" script, so you don't have to maintain it.
A cursor over the following statements does the trick.
Select * From Information_Schema.Table_Constraints
Select * From Information_Schema.Referential_Constraints
I previously organised my DDL code organised by one file per entity and made a tool that combined this into a single DDL script.
My former employer used a scheme where all table DDL was in one file (stored in oracle syntax), indicies in another, constraints in a third and static data in a fourth. A change script was kept in paralell with this (again in Oracle). The conversion to SQL was manual. It was a mess. I actually wrote a handy tool that will convert Oracle DDL to SQL Server (it worked 99.9% of the time).
I have recently switched to using Visual Studio Team System for Database professionals. So far it works fine, but there are some glitches if you use CLR functions within the database.