How to initialize SQL database and tables for a server side application? - sql

I'm integrating SQL database (PostgreSQL) into my first ever Java backend system.
My dilemma now is should the application create the database and the tables by itself, or should it just assume that the required database initialized with the tables will be available?
If the application should not manage the database by itself, then what's the best approach to reproducing the database on developers' machines (I want to be able to run integration tests with real database locally)?

I ended up using FlyWay as #Max suggested in his comment.
The setup was quite easy and intuitive. I used the default settings for time being.
Setup of integration testing on development machine was a breeze.
The only thing that I don't particularly like is that cleaning the database between tests using FlyWay is very slow because it doesn't just clean the tables, but erases the entire scheme.
This, however, is not a big issue for me right now (currently have ~100 tests), and, probably, will not be difficult to optimize in the future if needed.

Related

How to keep 2 Database Schemas consistent without effecting the data at all?

I have two server machines (One for development, other for Clients) with SQL Server 2008 installations. Whenever a developer makes changes to tables/views/stored procedures in the Development Server, it needs to reflect the Client Server as well.
Currently, I am manually handling all changes like new columns in Tables, changes in Stored procedures etc. Can DB scripts or replication automate the entire procedure for me? Or is there some better solution to keep database schemas consistent.
Help will be highly appreciated.
Thanks!
I highly recommend to create an environment where all schema changes are done exclusively through SQL scripts - never "manually" in any environment. Each developer has to commit the script related to his/her bugfixed (or new features) to a version control system.
Typically you'd have one big script that creates the database from scratch and one for each version upgrade (from 1.0 to 1.1, one from 1.1 to 1.2 and so on)
If you have the man power it is also very handy to maintain one "from-scratch" script for each version. Whether you need that or not depends on how often an installation on an empty system is done.
We have very good experience with using Liquibase to maintain all this. It automatically keeps track which patches have been applied to a database and which need to be run during an upgrade. It also prevents you to run the same migration twice.
A problem that all database applications have, and a difficult one to resolve. Such a solution cannot be scheduled, as the changes made by developers need to be tested first, and you certainly don't want untested code merged with your live database. This question is of interest to me because I'm currently writing a generic solution to resolve this issue once and for all.
But in the meantime, we're using an open-source product called Open DBDiff (Google it - you can't miss it), which could do with some polishing but works well enough. You pass it your source and target databases, and it generates a script to make the target the same as the source. It does seem to have some trouble copying assemblies and user roles, but for everything, I haven't had any trouble.
I believe a human should do the deployments, after making sure the changes have been tested and properly checked into the source control. This is not something to automate fully.
Human should use the tools though. I use Visual Studio 2010 Professional, which has a powerful schema comparison tool, generates and executes deployment scripts and has source control integration.

Strategies for Fixing Problems / Tweaking NHibernate Apps in Production

First off, I am not a DBA, but I do work in an environment where DBAs do tune/make changes in the production database from time to time in ways that do not cause the need for an application rebuild/redeployment. Usually these changes consist of reworking indexes, changing procs, and sometimes changing the table structure in minor ways (usually abstracted from the app via procs).
Obviously, a team should strive to catch performance problems with NHibernate before they get into production using things like NHProf, SQL Profiler, and load tests. That being said, are there certain strategies that can be used to allow some amount of tweaking once the code is built and out running in production? Using stored procedures 100% of the time seems like it would allow the most flexibility for the DBA's, but obviously that would really kill the efficiency of NHibernate. From what I've read, updatable views (in SQL Server) don't really work that well with NHibernate either (this may-or-may-not be true).
I've read quite a bit about NHibernate and experimented with it over the years, but I have never put it into practice in a production environment. I have yet to come across a set of "best practices" to allow for maximum tweaking once deployed.
As an NHibernate user, how are you and your team dealing with issues if they arise in production? My production environment is made up of ASP.NET apps and SQL server, but I don't think the answers need to be restricted to that platform.
I am in a similar position, and in order to keep our DBA happy, I did the following:
Wrote some of the queries in HQL, some others in SQL (especially those perf-sensitive)
Externalized those queries to files, one file per query.
When your app needs to execute of these queries, it just loads the appropriate file, optionally running it through a pre-processor, and runs it.
With this approach, the DBA could theoretically tweak the queries just by modifying those files. That's quite similar to having stored procedures.
In practice, it's up to you to decide if you'll really give the DBA access to those files (if you catch my drift...)
IMHO the DBA should just use the DBMS's profiling tools and report her findings back to the devs (as in "there's this query that is running 20 times/sec and does 10 joins. is that really necessary? can it be cached? do you really need all those joins? can we denormalize this?" etc.
I'm not in the deploy phase yet, but on my current project I've come up against this already and my solution presently has been to replace my queries with stored procs. As long as the shape of the data coming back from the DB remains the same it's not a big deal. Yeah you do lose some of that agility you enjoyed during development but I'm not sure it's as bad as it initially sounds. You'll have a code push when you first make the change of course, and then from that point it's just proc changes.
You can use a profiler like NHProf to see the sql queries executed, so you can show them to a DBA. This tool can also detect some problem like n+1 select.
Using a second level of cache can be useful : http://web.archive.org/web/20110514214657/http://blogs.hibernatingrhinos.com/nhibernate/archive/2008/11/09/first-and-second-level-caching-in-nhibernate.aspx

Dynamic patching of databases

Please forgive my long question. I have an idea for a design that I could use some comments on. Is it a good idea to do this? And what are the pit falls I should be aware of? Are there other similar implementations that are better?
My situation is as follows:
I am working on a rewrite of a windows forms application that connects to a SQL 2008 (earlier it was SQL 2005) server. The application is an "expert-system" for an engineering company where we store structured data about constructions. We have control of all installations of the client software, we have no external customers or users, they are all internal to the company, and they are all be trusted not to do anything malicious to the software or database.
The current design doesn't have too many tables (about 10 - 20) but some of them have millions of records that belong to several hundred constructions. The systems performance has been ok so far, but it is starting to degrade as we are pushing the limits of the design.
As part of the rewrite I am considering splitting the database into one master database and several "child" databases where each describes one construction. Each child database should be of identical design. This should eliminate the performance problems we are seeing today since the data stored in each database would be less than one percent of the total data amount.
My concern is that instead of maintaining one database we will now get hundreds of databases that must be kept up to date. The system is constantly evolving as the companys requirements change (you know how it is), and while we try to look forward to reduce the number of changes the changes will come. So we will need a system where we keep track of all database changes done to the system so they can be applied to the child databases. Updating the client application won't be a problem, we have good control of that aspect.
I am thinking of a change tracing system where we store database scripts for all changes in a table in the master database. We can then give each change a version number and we can store a current version number in each child database. When the client program connects to a child database we can then check the version number of the database against the current version number of the master database and if there are patches with version numbers greater than the version number of the child database we run these and update the child database to the latest version.
As I see it this should work well. Any changes to the system will first be tested and validated before committed as a new version of the database. The change will then be applied to the database the first time a user opens it. I suppose we would open the database in exclusive mode while applying the changes, but as long as the changes aren't too frequent this should not be a problem.
So what do you think? Will this work? Have any of you done something similar? Should we scrap the solution and go for the monolithic system instead?
Have you considered partitioning your large tables by 'construction'? This could alleviate some of the growing pains by splitting the storage for the tables across files/physical devices without needing to change your application.
Adding spindles (more drives) and performing a few hours of DBA work can often be cheaper than modifying/adapting software.
Otherwise, I'd agree with #heikogerlach and these similar posts:
How do I version my ms sql database
Mechanisms for tracking DB schema changes
How do you manage databases in development, test and production?
I have a similar situation here, though I use MySQL. Every database has a versions table that contains the version (simply an integer) and a short comment of what has changed in this version. I use a script to update the databases. Every database change can be in one function or sometimes one change is made by multiple functions. Functions contain the version number in the function name. The script looks up the highest version number in a database and applies only the functions that have a higher version number in order.
This makes it easy to update databases (just add new change functions) and allows me to quickly upgrade a recovered database if necessary (just run the script again).
Even when testing the changes before this allows for defensive changes. If you make some heavy changes on a table and you want to play it safe:
def change103(...):
"Create new table."
def change104(...):
"""Transfer data from old table to new table and make
complicated changes in the process.
"""
def change105(...):
"Drop old table"
def change106(...):
"Rename new table to old table"
if in change104() is something going wrong (and throws an exception) you can simply delete the already converted data from the new table, fix your change function and run the script again.
But I don't think that changing a database dynamically when a client connects is a good idea. Sometimes changes can take some time. And the software that accesses a database should match the schema of the database. You have somehow to keep them in sync. Maybe you could distribute a new software version and then you want to upgrade the database when a client is actually starting to use this new software. But I haven't tried that.
Better don't create additional databases. At first glance you may think that you'll get some performance gain, but actually you get support nightmare. Remember - what can break, does break sooner or later.
It is way simpler to perform and optimize queries in single database. It is much easier manage user permissions in single database. It is much easier to make consistent backups for single database.
Like KenG said, if you need break your large tables - consider partitioning them. And add some drives :)
But at first run SQL profiler on your database and optimize indexes and queries. Several million rows is usually not a big problem to handle (unless your customer needs live totaling over half of these, in which case no partitioning can help).
I know that this is a crazy answer but here it goes...
I currently have a similar scenario where I need to keep control of database versions in multiple locations for a system using MS SQL Server.
What I am doing now is using Ruby on Rails ActiveRecord Migrations to keep control of database versions. Yes I know that we are talking about Windows systems but this works fine for me. (By the way, my system is programmed in VB and .NET)
I have installed Rails on each server, when I need to update the database schema I copy the migration files to the server and run rake db:migrate which updates the database to the latest version or rollbacks it to a desired version.
As a side effect you will have a set of migration files for your database schema in an database independent language (in this case ruby) that you can apply to other database engines and that you can put under source control too.
I know that this is a strange solution in which a totally different technology is used but it does not hurt to learn new approaches. You can find additional information here.
I have become a better .Net programmer since I learned Ruby on Rails. I asked here before a question about this approach.

How can we migrate to using VS2005's Database Projects?

At my company, our current method of updating the database is to connect using the Server Explorer in VS2005, then modify the stored procedures by opening them and editing. The devs here seem to enjoy that "write and save it like it's code" mentality. It is pretty convenient, how it automatically turns Create into Alter and runs the scripts against the existing database when we need to tweak something.
Recently, this bit us pretty hard during a server crash when we lost a lot of changes that hadn't gotten backed up. I'm pushing to move our SQL development where it belongs: in DB Projects so we can put them into SVN along with the othe code. The alternative is nightly back-ups of the database.
I don't know much about DB projects though, or how the workflow with them is. I'm afraid that if I can't get something of similar utility to their current model, they just won't switch. Any thoughts on maintaining our current working model, but switching over to DB Projects?
If the developers make the rules (and your post sounds like they do), you can only proceed if the new workflow is "better" to them. Being a developer myself, I think that's the way it should be. I've seen some non-developers think up pretty nonsensical development processes, and force them on the developers to everyone's detriment.
If you're thinking about VS DB projects, you'd first test if VS DB actually works with your database. If it does, you'd have to set up a big chance in process: the "true" copy of the database is now in VS DB instead of the database server.
Another way out is to backup the development server regularly. If you back it up daily, and a transaction log backup every hour, it becomes very hard to loose a significant amount of work.
Or create a scheduled job that writes the entire database definition to a text file. (Script all objects in database.) These files are usually very small, so you can keep a long backlog.
Many respected bloggers seem to think storing database definitions in SVN is a good idea. See this coding horror post, or related Stack Overflow related question How do I version my MS SQL database in SVN.
Talk it over with the developers and see what you can agree on.

What point should someone decide to switch Database Systems

When developing whether its Web or Desktop at which point should a developer switch from SQLite, MySQL, MS SQL, etc
It depends on what you are doing. You might switch if:
You need more scalability or better performance - say from SQLite to SQL Server or Oracle.
You need access to more specific datatypes.
You need to support a customer that only runs a particular database.
You need better DBA tools.
Your application is using a different platform where your database no longer runs, or it's libraries do not run.
You have the ability/time/budget to actually make the change. Depending on the situation, the migration could be a bigger project than everything in the project up to that point. Migrations like these are great places to introduce inconsistencies, or to lose data, so a lot of care is required.
There are many more reasons for switching and it all depends on your requirements and the attributes of the databases.
You should switch databases at milestone 2.3433, 3ps prior to the left branch of dendrite 8,151,215.
You should switch databases when you have a reason to do so, would be my advice. If your existing database is performing to your expectations, supports the load that is being placed on it by your production systems, has the features you require in your applications and you aren't bored with it, why change? However, if you find your application isn't scaling, or you are designing an application that has high load or scalability requirements and your research tells you your current database platform is weak in that area, or, as was already mentioned, you need some spatial analysis or feature that a particular database has, well there you go.
Another consideration might be taking up the use of a database agnostic ORM tool that can allow you to experiment freely with different database platforms with a simple configuration setting. That was the trigger for us to consider trying out something new in the DB department. If our application can handle any DB the ORM can handle, why pay licensing fees on a commercial database when an open source DB works just as well for the levels of performance we require?
The bottom line, though, is that with databases or any other technology, I think there are no "business rules" that will tell you when it is time to switch - your scenario will tell you it is time to switch because something in your solution won't be quite right, and if you aren't at that point, no need to change.
BrianLy hit the nail on the head, but I'd also add that you may end up using different databases at different levels of development. It's not uncommon for developers to use SQLite on their workstation when they're coding against their personal development server, and then have the staging and/or production sites using a different database tool.
Of course, if you're using extensions or capabilities specific to a certain database tool (say, PostGIS in PostGreSQL), then obviously that wouldn't work.