This is a problem that I come to on occasion and have yet to work out an answer that I'm happy with. I'm looking for a build system that works well for building a database - that is running all of the SQL files in the correct database instance as the correct user and in the correct order, and handling dependencies and the like properly.
I have a system that I hacked together using Gnu Make and it works, but it's not especially flexable and frankly can be a bit of a pain to work with in some situations. I've considered looking at things like SCons and CMake too, but I don't know how much better they are likely to be, or if there's a better system out there that already exists...
Just a shell script that runs all the create statements and imports in the proper order. You may also find migrations (comes with rails) interesting. It provides a make like infrastructure that let's you maintain a database the structure of which evolves over time.
Say you add a new column to some table. In migrations you'd write a snippet of code which describes the requirements for adding the column and also to rollback the change so you can switch to different versions of your schema automatically.
I'm not a big fan of the tight integration with rails, though, but the principles behind it are very interesting.
For SQL Server, I just use a batch file with SQLCMD.EXE and a bunch of .SQL files. It's not perfect, but it seems to work.
For my database, I use Migrator.NET
This is a .NET framework which allows you to create classes in where you define your DDL statements.
The framework comes with a command-line tool with which you can execute your 'migrations' in the correct order.
It also has a msbuild - task, so you can integrate it in a continuous integration build as well.
First export full DDL files describing all tables, views, source code
(procedures, functions, packages), sequences, and grants of a DB schema
See
Is there a tool to generate a full database DDL for SQL Server? What about Postgres and MySQL?
I created a database build system (part SQL-parser, part make file) to put these files together in a DB creation script using python.
Related
Is it possible to use Npgsql in a way that basically mimics pg_dumpall to a single output file without having to iterate through each table in the database? Conversely, I'd also like to be able to take such output and use Npgsql to restore an entire database if possible.
I know that with more recent versions of Npgsql I can use the BeginBinaryExport, BeginTextExport, or BeginRawBinaryCopy methods to export from the database to STDOUT or to a file. On the other side of the process, I can use the BeginBinaryImport, BeginTextImport, or BeginRawBinaryCopy methods to import from STDIN or an existing file. However, from what I've been able to find so far, these methods use the COPY SQL syntax, which (AFAIK) is limited to a single table at a time.
Why am I asking this question? I currently have an old batch file that I use to export my production database to a file (using pg_dumpall.exe) before importing it back into my testing environment (using psql.exe with the < operation). This has been working pretty much flawlessly for quite a while now, but we've recently moved the server to an off-site hosted environment, which is causing a delay that prevents the batch file from completing successfully. Because of the potential for other connectivity/timeout issues, I'm thinking of moving the batch file's functionality to a .NET application, but this part has got me a bit stumped.
Thanks for your help and let me know if you need any further clarification.
This has been asked for in https://github.com/npgsql/npgsql/issues/1397.
Long story short, Npgsql doesn't have any sort of support for dumping/restoring entire databases. Implementing that would be a pretty significant effort that would pretty much duplicate all the pg_dump logic, and the danger for subtle omissions and bugs would be considerable.
If you just need to dump data for some tables, then as you mentioned, the COPY API is pretty good for that. If, however, you need to also save the schema itself as well as other, non-table entities (sequence state, extensions...), then the only current option AFAIK is to execute pg_dump as an external process (or use one of the other backup/restore options).
I've been thinking of ways to improve managing changes to our database structure. I have a build server that creates nightly builds, so I was thinking we could somehow create database dumps, backups, and scripts from the test environment as part of the build process. Then when deploying an update to the client we could use a tool like DBDiff to create the database update script.
Is anybody doing something similar? Is it even a good idea? Maybe some good tips what to use to create these dumps on build server?
Rather than identifying the differences, I recommend to keep a proper script that creates a database from scratch.
We are quite satisfied with using Liquibase to manage all DB migration in our projects. It knows which "patches" have been applied and ensures that only those that are missing will be applied to the target database.
this is possible.
the differencing is the hard part. once you identify the differences, you need to construct the appropriate sql, then apply it. you can either apply it directly, or create some script that you can run after review.
when both sides change, then you need to decide if the target system should keep its change or if that should be completely removed.
remember that when the target system changes also include data, and if you remove some table or column, then your referential integrity might be completely ruined.
one more thought. you will need access to the target system in order to determine the diff. if this is a generic utility, you will need to make it an executable after the fact, not part of the build.
You will find the Visual Database Tools very useful here.
http://msdn.microsoft.com/en-us/library/y5a4ezk9.aspx
There is a schema compare built right into Visual Studio (it can also be run from the command line). There is also a database project that contains a complete set of scripts for the database and the objects that it contains. This can be checked into source control along with your source code.
You can deploy a new database based on these scripts with a context menu click.
Have a look at http://www.codeproject.com/KB/architecture/Database_CI.aspx and http://www.martinfowler.com/articles/evodb.html - there's a fair amount of thinking that's already available.
We are currently looking at the Juneau CTP release, SQL Tools for Visual Studio. It has a snapshot and schema comparison feature. Basically, it can auto-generate scripts between two schemas for you. If you use this against two versions of your database, it will give you an upgrade script.
http://msdn.microsoft.com/en-us/data/gg427686
Here at Red Gate we're close to releasing a solution which solves that precise issue using SQL Source Control and SQL Compare. We have an early access program which will allow you to try this out. Please visit the following link for sign-up details.
http://www.red-gate.com/MessageBoard/viewtopic.php?p=46951#46951
Up until now, my experience with databases has always been working with an intermediate definition layer that we have where I work. i.e. SQL wasn't directly written for the table definitions, but generated from an intermediate file which wrote out SQL scripts for creating the appropriate tables, upgrade scripts between schema changes, and helper functions for doing simple queries/updates/inserts/deletes from the database.
Now I'm in a situation where I don't have access to that, for reasons I won't get into, and I find myself somewhat lost at sea regarding what to do. I need to have a small number of tables in a database, and I'm unsure what's usually done to manage the table definitions.
Do people normally just use the SQL script that does the table creation as their definition, or does everyone just use an IDE that manages the definition in a separate file and regenerates the SQL script to create the tables?
I'd really prefer not to have to introduce a dependence on a specific IDE, because as we all know, developers are whiners that are prone to religious debates over small things.
Open your favorite text editor -> Start writing CREATE scripts -> Save -> Put in Source Control
That script now becomes the basis for you database. Anytime there are schema changes, they get put back into the scripts so that they don't get lost.
These become your definition.
I find it more reliable than depending on any specific IDE/Platform generating those scripts for you.
We write the scripts ourselves and store them in source control like any other code. Then the scripts that are appropriate for a particular version are all groupd together and promoted to prod together. Make sure to use alter table when changing existing tables becasue you don't want to drop and recreate them if they have data! I use a drop and recireate for all other objects though. If you need to add records to a particular table (usually a lookup of sometype) we do that in scripts as well. Then that too gets promoted with the rest of the version code.
For me, putting the scripts in source control however they are generated is the key step. This is how you know what you have changed for the next release. This is how you can see earlier versions and revert back easily if there is a problem. Treat database code the same wayyou treat all other code.
YOu could use one of the data modelling tools that creates scripts for you if you are starting out on a database design and the eventually want to create it for you. Some tools for that are Erwin, Fabforce etc... (though not free)
If you have access to an IDE like SQL Management studio, you can create them by using an GUI thats pretty simple.
If you are writing your own code, Its always better to write your own scripts based on a good template so that you cover all the properties of the definition of the table like the file_group, Collation & stuff. Hope this helps
Once you do create a base copy generate scripts and have a base reference copy of it so that you could do "incremental" changes on them and manage them in a source control.
Though I use TOAD for Oracle, I always write the scripts to create my database objects by hand. It gives you (and your DBA's) more control and knowledge of what's being created and how.
If your schema is too difficult to describe in SQL, you probably have other issues more pressing than which IDE. Use modelling documentation if you need a graphical representation, but yeah, you don't need an IDE.
There are multiple ways out there for what you are asking.
Old traditional way is to have a script file ready with your application that has CREATE TABLE statement.
If you are a developer, and that too a Java enterprise developer, you could generate complete schema using a persistence library called Hibernate. Here is a how to
If you are a DBA level user, you could take schema export from one environment and import that in to your current environment. This is a standard practice among DBAs. But it requires admin access as you can see. Also, the methods are dependent on the database you are using (oracle, db2 etc)
I had a (friendly but heated) argument with my lead developer the other day because our project has TSQL Scripts that I code directly into SQL files which I then run against the database. I find that when I do this, it's easy to work out the schema in advance without fiddly pointing and clicking and then there's no opportunity to forget to generate a script to put into source control as generating the script no longer becomes a chore you have to do after the fact, but is an implicit part of the process (and also leads to cleaner scripts without the extra crap that SQL Server Management Studio inserts into the scripts it generates).
My lead developer insists that having to manually script it out is a pain in the arse and that he absolutely refuses to write his scripts by hand when there are perfectly good tools to do it without coding. I've noticed that the copying of his changes into the actual scripts tends to get delayed a bit as a result though.
What are your thoughts on the pros and/or cons of doing it one way vs the other? Am I being too rigid/old-school in my sticking to hand coding schema scripts or is he being too reliant on third party tools and losing something in the process?
I always script stuff myself because the wizards sometimes don't script things in a way that I like it and will also give funky names to defaults
scripting things yourself is also good in case you get laid off and you have to go for an interview where they ask you to script DDL on the whiteboard
As I usually collaborate with a colleague during the schema design, I tend to design the schema using the GUI tools, as its easier to discuss it with a diagram of the tables in front of you. I then generate the scripts, being careful to select the exact options that I want to avoid having to make manual changes post-export.
I think a decision on the relative merits of the two approaches might take into account factors such as
the frequency of changes to the schema
the frequency with which changes need to be propagated to other schemas (test, user acceptance, production, clients * n, etc)
the degree to which the schema may vary across development branches
how well-known in advance your various changes can be scheduled
whether or not you can generate SQL "diff" scripts between schemas.
On balance, I tend to prefer to work with a script for each change (or "migration"). It lets me resequence change releases as priorities shift.
Just because you can create tables in a graphical tool doesn't necessarily mean you should.
I find its as quick to write a script as it is to use SQLMS. You still have to type names in SQLMS, and the time spent moving from keyboard and mouse could be used writing the proper script anyway.
The two of you are almost working with two sets of code. Consistency seems to be a key factor on these types of decisions. In your case, if you create a script, your boss uses the gui to add a field, how do you stay in sync? You can't use your script to rebuild the table without editing it (Chance for error.).
Maybe he should pull rank and force you to format your scripts the same way the GUI creates them - just kidding.
I think you should flip on it..........
I am currently creating a master ddl for our database. Historically we have used backup/restore to version our database, and not maintained any ddl scripts. The schema is quite large.
My current thinking:
Break script into parts (possibly in separate scripts):
table creation
add indexes
add triggers
add constraints
Each script would get called by the master script.
I might need a script to drop constraints temporarily for testing
There may be orphaned tables in the schema, I plan to identify suspect tables.
Any other advice?
Edit: Also if anyone knows good tools to automate part of the process, we're using MS SQL 2000 (old, I know).
I think the basic idea is good.
The nice thing about building all the tables first and then building all the constraints, is that the tables can be created in any order. When I've done this I had one file per table, which I put in a directory called "Tables" and then a script which executed all the files in that directory. Likewise I had a folder for constraint scripts (which did foreign key and indexes too), which were executed when after the tables were built.
I would separate the build of the triggers and stored procedures, and run these last. The point about these is they can be run and re-run on the database without affecting the data. This means you can treat them just like ordinary code. You should include "if exists...drop" statements at the beginning of each trigger and procedure script, to make them re-runnable.
So the order would be
table creation
add indexes
add constraints
Then
add triggers
add stored procedures
On my current project we are using MSBuild to run the scripts. There are some extension targets that you can get for it which allow you to call sql scripts. In the past I have used perl which was fine too (and batch files...which I would not recommend - the're too limited).
#Adam
Or how about just by domain -- a useful grouping of related tables in the same file, but separate from the rest?
Only problem is if some domains (in this somewhat legacy system) are tightly coupled. Plus you have to maintain the dependencies between your different sub-scripts.
If you are looking for an automation tool, I have often worked with EMS SQLManager, which allows you to generate automatically a ddl script from a database.
Data inserts in reference tables might be mandatory before putting your database on line. This can even be considered as part of the ddl script. EMS can also generate scripts for data inserts from existing databases.
Need for indexes might not be properly estimated at the ddl stage. You will just need to declare them for primary/foreign keys. Other indexes should be created later, once views and queries have been defined
What you have there seems to be pretty good. My company has on occasion, for large enough databases, broken it down even further, perhaps to the individual object level. In this way each table/index/... has its own file. Can be useful, can be overkill. Really depends on how you are using it.
#Justin
By domain is mostly always sufficient. I agree that there are some complexities to deal with when doing it this way, but that should be easy enough to handle.
I think this method provides a little more seperation (which in a large database you will come to appreciate) while still making itself pretty manageable. We also write Perl scripts that do a lot of the processing of these DDL files, so that might be an option of a good way to handle that.
there is a neat tools that will iterate through the entire sql server and extract all the table, view, stored proceedures and UDF defintions to the local file system as SQL scripts (Text Files). I have used this with 2005 and 2008, not sure how it wil work with 2000 though. Check out http://www.antipodeansoftware.com/Home/Products
Invest the time to write a generic "drop all constraints" script, so you don't have to maintain it.
A cursor over the following statements does the trick.
Select * From Information_Schema.Table_Constraints
Select * From Information_Schema.Referential_Constraints
I previously organised my DDL code organised by one file per entity and made a tool that combined this into a single DDL script.
My former employer used a scheme where all table DDL was in one file (stored in oracle syntax), indicies in another, constraints in a third and static data in a fourth. A change script was kept in paralell with this (again in Oracle). The conversion to SQL was manual. It was a mess. I actually wrote a handy tool that will convert Oracle DDL to SQL Server (it worked 99.9% of the time).
I have recently switched to using Visual Studio Team System for Database professionals. So far it works fine, but there are some glitches if you use CLR functions within the database.