Good reference site for SQL/RDBMS - sql

One of the aspects of computer science/practical software engineering I am weaker at is actually doing significant work in database systems. That is to say, I can do simple queries on smaller datasets, no problem. However, working with complex queries on large datasets invokes a level of understanding of databases beyond me right now. For example, I built an amusing query some time ago that computed a join using a n^2 size where n=20,000- the hosting server suspended my account for blowing the CPU. Shocking.
I am interested in bringing myself up to speed on how to design schema and queries that, well, don't bring down the server. Pursuant to that end, what materials do you recommend that discuss professional database/SQL design and writing?

I would go to the bookstore and pick out some books on performance tuning for the database of your choice (it is very differnet depending onteh database backend). This will help you understand what not to do which is critical to designing databases.
Here's a site with a lot of good info
http://wiki.lessthandot.com/index.php/Category:Data_Management

For generic SQL I would go for Celko's books. For vendor specific, it depends on the platform of your choice. I know the SQL Server platform well and for that my praise go to the Inside series.
Blogs are also usefull, look at the all time SQL tag right here on SO and check the top answerers info, some have personal blogs that are very usefull. Eg. go through Quassnoi's blog, it has a LOT of useful info on MySQL, Oracle, SQL Server.

Related

Any Validity to the NoSQL movement?

First of all im relatively new to the Database world, Im graduating with my B.S. in Comp Science this semester and Database Technologies have really caught my eye so ive been studying alot of T-SQL because I want to in the end get a SQL Development job (MS SQL server seemed like the best choice right now because it's on the rise)
ANYWAYS, i've heard alot of hoopla about this NOsql movement of Non-relational database management systems. Trying to keep this question and non-subjective as possible i mainly want to know the advantages/disadvantages of NRDBMS's (like Nosql) and if there is really a future in them. Perhaps as a side question, is it a bad time to be studying SQL in general (specifically the normal RDBMS's we are so used to). I forsee people sticking with this for a long time, but then again.....I dont know. I'd hate to see my interest suddenly be taking a dive in the market.
There is definitely validity to the NoSQL movement, but I wouldn't worry about your SQL skills going to waste. NoSQL storage architectures were born out of the need for highly available and scalable data stores that went beyond what a typical relational database could provide. This comes at a cost though, and typically that cost is guaranteed consistency. This isn't always a large concern. In the case of something like Facebook doesn't have complete consistency for a period of time for things like your pictures, status updates, etc. As long as they get consistent at some point, it's okay. On the other end, take your bank account. That type of data store needs to provide the strong ACID characteristics that a relational database provides.
NoSQL isn't something that I see taking over the world, it's an alternative to the common approach of RDBMS's and as with everything else it has it's strengths and weaknesses.
Here is an excellent article on the subject written about NetFlix.
Others can address the NoSQL specifics better than I can, but as for the second part of your question (worrying about getting into SQL if NoSQL starts becoming more popular): I have customers who still use very old flat-file based mainframes.
SQL hasn't even reached full penetration yet, and it is VERY entrenched in a large number of business processes. The market for SQL development and maintenance won't be going away any time soon, and if it starts to it won't be overnight - you'll have time to learn the Next Big Thing before you're obsolete.
NoSql databases are great for storing unstructured data. Think of it as the next generation of Lotus Notes.
I wouldn't leverage a NoSql database for storing a list of people and addresses, as those are completely structured and well known.
However, if I had a set of dynamic attributes of some type (name/value pairs) or something a similar which required a lot of pivoting to get to, then I'd seriously look into it. I might even go that route even if there is structure, but it isn't known ahead of time. Such as with dynamic tables.
That said, when we did some evaluations earlier this year (March 2010) and we didn't think the state of the available open source NoSql databases were ready for serious production. There's a lot more to databases than just putting data in and getting it out. Automated backups, load balancing, solid query tools, consistency checkers, etc are an absolute must. We will reevaluate early next year.
SQL ain't going away, and the relational model is a basic information-systems building block that's definitely worth studying and understanding in its own right. I'd stick with it.
Databases based on an object instead of relational model have existed forever. The difference is that in the past they tended to be closed (and expensive!) packages from single vendors. No-one really wants to have their mission-critical apps locked into a proprietary database, dependent on licensing from a single, sometimes unresponsive, supplier.
In contrast today's NoSQL databases are typically free, open, and well-aligned to existing web-oriented technologies, allowing for quick, responsive scaling without worrying about licenses, and potential participation in future development (or local forking/patching if necessary).
What they also are is diverse, such that you can't really classify them all together as being good for a particular kind of task. There are trivial key-value buckets that make no attempt at being ACID-safe, there are object databases with their own safety paradigms (like CouchDB's revision conflicts), there are more traditional relational-like databases that just don't use SQL as a query mechanism (because let's face it, nice though it is that you can use the same query language across databases, hacking together SQL queries into a string just so that the database at the other end can pick the string apart to get the logic of the query you wanted to do, is a bit silly).
There are lots of them, most are very immature compared to the ancient edifice of SQL, and it'll take a while for winners to emerge. Is NoSQL “valid”? Sure. But I would say to use a particular NoSQL database as a basis for study today (as opposed to using one that fits your needs for a particular task that SQL is bad at) would be premature.
The future of big systems will require skills with both SQL and NoSQL.
NoSQL is an important paradigm and it's not going anywhere. Joins don't scale horizontally and SQL database are effectively just big "join machines". NoSQL is still in relative infancy, there are tons of players and just like SQL, each one has its own little variations.
But that's all going to shake out in the next few years
As a recent grad, you have to start somewhere. SQL is simply the easiest place to start. You will see lots of it going forward. However, once you've got your head around SQL (say you've passed your MS T-SQL course), I strongly suggest taking a look at something like MongoDB/Riak/CouchDB as your next adventure.
You probably won't jump into a company using NoSQL, but you will run in to problems where NoSQL is actually a much simpler solution. But you won't know this until you actually play with NoSQL.
It sounds like you're already pointing in the right direction by looking at job postings and seeing what current needs are in the way of data storage and management, if this is your passion. I wouldn't be surprised if interviews will start asking about the advantages/disadvantages of nosql just to see if you're familiar with the latest developments (and if you're applying for a dba position, they might also ask about ACID compliance and the CAP theorem).
Lots of companies are starting to use NoSQL technologies, so it's valid in that people are using it. And not just small startups either, but companies like facebook (cassandra), yahoo (hadoop), google (bigtable), and etsy (mongodb) believe that nosql solutions fit certain needs.
I think NoSQL is more of a niche. It's really good for some applications, but will probably never totally displace RDBMSs (although combinations of NoSQL on top of an RDBMS backend seem to be coming out more I hear). Advice would be to get good with an old-school RDBMS (it's still much more common, at least from what I've seen), and then get into NoSQL on the side if you want.
Brent Ozar did an excellent writeup on the topic here: NoSQL Basics for Database Administrators

Set of Tools to optimize the performance in general of SQL Server

I know there are things out there to help to optimize queries, ect... but is there anything else, something like a full package that can scan your database and highlight all the performance issues, naming conventions, tables not properly normalized, etc?
I know this is the job of a DBA and if the DBA is good, he shouldn't need a tool like that, but sometimes you start a new job, you get in charge of an existing database and the DB is a mess, so you don't know where to start...
Thanks to everyone
Dave
In my opinion a normalized database does not guarantee good performance. Normalization is concerned primarily with data consistency.
It's not really practical for an automated tool to handle what you are proposing because there is no one case fits all implementation for best practice, which after all is to be treated as more of a guideline. That's why businesses hire people like me to take an objective look at their unique SQL Server environment (shameless plug).
The performance aspect can be addressed either by the wealth of features already available in the SQL Server product or by off the shelf tools such a those from Quest/Redgate and the like.
If you want to get a quick overall feel for the performance of a new SQL Server box that has come under your administrative control then I suggest either using the freely available Performance Dashboard Reports or SQL Server DMV's. You could also take a look at the current Wait Types on the server.
SQL Server 2005 Performance Dashboard
Reports
SQL Server Wait Type Example
How to identify the most costly SQL
Server queries using DMV’s
I hope this answers your question and do let me know if I can assist further.
Edited in response to comment:
Maybe a general Health Check could provide useful info.
SQL Server Best Practice
Analyzer

What are good alternatives to SQL (the language)? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I occasionally hear things about how SQL sucks and it's not a good language, but I never really hear much about alternatives to it. So, are other good languages that serve the same purpose (database access) and what makes them better than SQL? Are there any good databases that use this alternative language?
EDIT:
I'm familiar with SQL and use it all the time. I don't have a problem with it, I'm just interested in any alternatives that might exist, and why people like them better.
I'm also not looking for alternative kinds of databases (the NoSQL movement), just different ways of accessing databases.
I certainly agree that SQL's syntax is difficult to work with, both from the standpoint of automatically generating it, and from the standpoint of parsing it, and it's not the style of language we would write today if we were designing SQL for the demands we place on it today. I don't think we'd find so many varied keywords if we designed the language today, I suspect join syntax would be different, functions like GROUP_CONCAT would have more regular syntax rather than sticking more keywords in the middle of the parentheses to control its behavior... create your own laundry list of inconsistencies and redundancies in SQL that you'd like/expect to see smoothed out if we redesigned the language today.
There aren't any alternatives to SQL for speaking to relational databases (i.e. SQL as a protocol), but there are many alternatives to writing SQL in your applications. These alternatives have been implemented in the form of frontends for working with relational databases. Some examples of a frontend include:
SchemeQL and CLSQL, which are probably the most flexible, owing to their Lisp heritage, but they also look like a lot more like SQL than other frontends.
LINQ (in .Net)
ScalaQL and ScalaQuery (in Scala)
SqlStatement, ActiveRecord and many others in Ruby,
HaskellDB
...the list goes on for many other languages.
I think that the underlying theme today is that rather than replace SQL with one new query language, we are instead creating language-specific frontends to hide the SQL in our regular every-day programming languages, and treating SQL as the protocol for talking to relational databases.
Take a look at this list.
Hibernate Query Language is probably the most common. The advantage of Hibernate is that objects map very easily (nearly automatically) to the relational database, and the developer doesn't have to spend much time doing database design. Check out the Hibernate website for more info. I'm sure others will chime in with other interesting query languages...
Of course, there's plenty of NoSQL stuff out there, but you specifically mention that you're not interested in those.
"I occasionally hear things about how SQL sucks and it's not a good language"
SQL is over thirty years old. Insights about "which features make something a 'good' language and which ones make it a 'bad' one" have evolved more rapidly than SQL itself.
Also, SQL is not a language that conforms to current standards of "what it takes to be relational", so, SQL just isn't a relational language to boot.
"but I never really hear much about alternatives to it."
I invite you to ponder the possibility that you are trying to hear only in the wrong places (that is, the commercial DBMS industry exclusively).
"So, are other good languages that serve the same purpose (database access) and what makes them better than SQL?"
Date&Darwen describe the features that a modern data manipulation language must conform to in their "Third Manifesto", the most recent version of which is laid down in their book "Databases, Types & the Relational Model".
"Are there any good databases that use this alternative language?"
If by "good", you mean something like "industrial-strength", then no. The closest thing available would probably be Dataphor.
The Rel project offers an implementation for the Tutorial D language defined in "Databases, Types & The Relational Model", but the current prime goal of Rel is to be educational in nature.
My SIRA_PRISE project offers an implementation for "truly relational" data management, but I hesitate to also label it "an implementation of a language".
And of course, you might also look into some non-relational stuff, as some have proposed, but I personally dismiss non-relational data management as multiple decades of technological regression. Not worth considering, that is.
Oh, and by the way, a software system that is used to manage databases is not "a database", but "a DataBase Management System", "DBMS" for short. Just like a photograph is not the same thing as a camera, and if you are discussing cameras, and you want to avoid confusion, then you should be using the proper word "cameras" instead of "photograph".
Perhaps you're thinking of the criticism C. Date and his friends have uttered against existing relational databases and SQL; they say the systems and language aren't 100% relational, and should be. I don't really see any real problem here; as far as I can see you can have a 100% relational system, if you want, just by disciplining the way in which you use SQL.
What I personally keep running into is the lack of expressive power SQL inherits from its theoretical basis, relational algebra. One issue is the lack of support for the use of domain ordering, which you run into when you work with data marked by dates, timestamps, etcetera. I once tried to do a reporting application entirely in plain SQL on a database full of timestamps and it just wasn't feasible. Another is the lack of support for path traversal: most of my data look like directed graphs that I need to traverse paths in, and SQL can't do it. (It lacks "transitive closure". SQL-1999 can do it with "recursive subqueries" but I haven't seen them in actual use yet. There are also various hacks to make SQL cope but they're ugly.) These problems are also discussed by some of Date's writings, by the way.
Recently I was pointed at .QL which appears to address the transitive closure issue nicely, but I don't know whether it can resolve the issue with ordered domains.
Take a look at LINQ to SQL...
Tried it out a couple months ago and never looked back....
Direct answer: I don't think there's any serious contender out there. DBase and its imitators (Foxpro, Codebase etc) was a contender for a while, but I think they basically lost the database query language war. There have been many other database products that had their own query language, like Progress and Paradox and several others I've used whose names I don't remember and surely many more that I never heard of. But I don't think any other contender even came close to getting a non-trivial share of the market.
As simple proof that there is a difference between a database format and a query language, the last version of DBase I used -- many years ago now -- offerred both the "traditional" DBase query language and SQL, both of which could be used to access the same data.
Side ramble: I wouldn't say that SQL sucks, but it has many flaws. With the benefit of the years of experience and hindsight we now have, I'm sure one could design a better query language. But creating a better query language, and convincing people to use it, are two very different things. Would it be enough better to convince people that it was worth the trouble of learning. People have invested many years of their lives learning to use SQL effectively. Even if your new language is easier to use, there would surely be a learning curve. And how would you migrate your existing systems from SQL to the new language? Etc. It can be done, of course, just like C++, C#, and Java have largely overthrown COBOL and FORTRAN. But it takes a combination of technical superiority and good marketing to pull it off.
Still, I get a chuckle out of people who rush forward to defend SQL anytime someone criticizes it, who insist that any problem you have with SQL must be your own ineptitude in using it and not any fault of SQL, that you must just not have reached the higher plane of thingking necessary to comprehend its perfection, etc. Calm down, take a deep breath: We are insulting a computer language, not your mother.
Back in the 1980's, ObjectStore provided transparent object access. It was kind of like an RDBMS plus an ORM, except without all those extra leaky abstraction layers: it stored objects directly in the database.
So this alternative was really "no language at all", or perhaps "the language you're already using". You'd write C++ code and create or traverse objects as if they were native objects, and the database took care of everything as needed. Kind of like ActiveRecord but it actually worked as well as the ActiveRecord marketing blitzes claim. :-)
(Of course, it didn't have Oracle's marketing muscle, and it didn't have MySQL's zero-cost, so everybody ignored it. And now we try to replicate that with RDBMSs and ORMs, and some people try to argue that tables actually make sense for storing objects, and that writing giant XML file to tell your computer how to map objects to tables is somehow a reasonable solution.)
I think you might be interested in looking at Dataphor, which is an open-source relational development environment with its own database server (which speaks D), and the ability to derive user interfaces from its query language.
Also, it appears Ingres still supports QUEL, and it's open source.
The general movement these days is NoSQL; generally these technologies are:
Distributed "hashtables" that store data as key/value pairs
Document-oriented databases
Personally I think there is nothing wrong with SQL as long as it fits your needs. SQL is expressive and great for working with structured data.
SQL works fine for the domain for which it was designed — interrelated tables of data. This is generally found in traditional business data processing. SQL doesn't work that well when trying to persist a complex network of objects.
If your needs are to store and process relatively traditional data, use some SQL-based DBMS.
In response to your edit:
If you're looking for alternatives to the SQL DML for retrieving data from relational data stores, I've never heard of any serious alternative to SQL.
The knocks SQL gets are not, I think, so much against the language as opposed to the underlying data storage principles on which the language is based. People often confuse the language SQL with the relational data model on which RDBMSes are built.
Relational Databases are not the only kind of databases around. I should say a word about Object-Databases as I havn't seen it in responses from others. I had some experience with the Zope python framework that use ZODB for objects persistency instead of RDBMS (well, it's theoretically possible to replace ZODB by another database within zope but the last time I checked I didn't succeed to have it working, so can't be positive about that).
The ZODB mindset is really different, more like object programming that would happen to be persistent.
ORM can be seen as a kind of language
In a way I believe the Object-database model is what ORM are about : accessing persistent data through your usual object model. It's a kind of language and it's gaining some market share, but for now we don't see it as a language but as an abstraction layer. However I believe it would be much more efficient to use an ORM over an Object-database than over SQL (in other words performance of ORMs I happened to use using some SQL database as base layers sucked).
There are many implementations of SQL (SQL Server, mysql, Oracle, etc.), but there is no other language that serves the same purpose in the sense of being a general purpose language designed for relational data storage and retrieval.
There are object databases such as db4o, and there are similar so-called noSQL databases that refer to just about any data storage mechanism that doesn't rely on SQL, but most commonly open-source products like Cassandra based loosely on Google's Bigtable concept.
There are also a number of special-purpose database products like CDF, but you probably don't need to worry about those - if you need one, you'll know.
None of these are equivalent to SQL.
That doesn't mean they're "better" or "worse" - they're just not the same. Dennis Forbes wrote a great post recently breaking down a number of the strange claims surfacing against SQL. He maintains (and I agree) that these complaints originate largely from people and shops who have either picked the wrong tool for the job in the first place, or aren't using their SQL DBMS properly (I'm not even surprised anymore when I see another SQL database where every column is a varchar(50) and there's not a single index or key, anywhere).
If you are implementing yet another social networking site and aren't too concerned with ACID principles, by all means start looking into products such as db4o. If you are developing a mission-critical business system, however, I highly highly recommend that you think twice before joining the "SQL sucks" chorus. Do the research first, find out what features the various products can and cannot support.
Edit - I was busy writing my answer and didn't get the question update from a few minutes. Having said that, SQL is essentially inseparable from the DBMS itself. If you run a SQL database product, then you access it with SQL, period.
Perhaps you are looking for abstractions over the syntax; Linq to SQL, Entity Framework, Hibernate/NHibernate, SubSonic, and a host of other ORM tools all provide their own SQL-like syntax that is not quite SQL. All of these "compile down" to SQL. If you run SQL Server, then you can also write CLR Functions/Procedures/Triggers, which allows you to write code in any .NET language that will run inside the database; however, this isn't really a substitute for SQL, more of an extension to it.
I'm not aware of any full "language" that you can layer on top of a SQL database; short of switching to a different database product, you're eventually going to see SQL on the pipe.
SQL is de-facto.
Frameworks that try to shield developers from it have eventually created their own specific language (Hibernate HQL comes to mind).
SQL solves a problem fairly well. It is no more difficult to learn than a high level programming language. If you already know a functional language then it is a breeze to grasp SQL.
Considering the leading database vendors providing state of the art databases (Oracle and SQL Server) support SQL and have invested years into optimization engines, etc. and all leading data modelling software and change management software deals in SQL, I'd say it is the safest bet.
Also, there is more to a database than just queries. There is scalability, backup and recovery, data mining. The big vendors support a lot of things that even the new "cache" engines don't even consider.
Problems with SQL have motivated me to cook up a draft query language called SMEQL over at the Portland Pattern Repository wiki. Comments Welcome. It borrows ideas from functional programming and IBM's experimental Business System 12 language. (I originally called it TQL, but found later that name was taken.)
Within the .NET world, while it still has a SQL-esque feel to it, LINQ-to-SQL will allow you to have a good mix of SQL and in-memory .NET processing of your data. It also simplifies a lot of the lower-level data plumbing that nobody really wants to do.
If you want to see a database type of a completely different mindset, take a look at CouchDB. "Better" is obviously a relative requirement and this sort of non-relation database is "Better" but only in certain scenarios.
SQL the language is very powerful, and relational database management systems have been and still are a huge success. But there is a class of application that requires very high scalability and availability, but not necessarily a high degree of data consistency (eventual consistency is what matters). A variety of systems get better performance and scaling than an RDBMS by relaxing the need for full ACID compliant transactions. These have been named "NoSQL", but as others point out, this is a misnomer: that perhaps they should be called NoACID databases.
Michael Stonebraker covers this in The "NoSQL" Discussion has Nothing to Do With SQL.

Choosing the right database: MySQL vs. Everything else

It would seem these days that everyone just goes with MySQL because that's just what everyone goes with. I'm working on a web application that will be handling a large quantity of incoming data and am wondering if I should "just go with MySQL" or if I should take a look at other open-source databases or even commercial databases?
EDIT: Should have mentioned, am looking for optimal performance, integration with ruby + rails running on debian 5 and money is tight although if it will save money in the long run I would consider making an investment into something more expensive.
I've posted this before, but I have no reason to change this advice:
MySQL is easier to start using.
Nicer UI tools. Faster, if you don't use ACID. More tolerant of invalid data. Autoincrement columns are as easy as typing autoincrement. Permissions aren't as tied to the file systems and OS users. Setting a delimiter is easier than using PG's "dollar sign quoting" when writing a stored proc. In MySQL, you connect to all databases, not just one at a time.
Postgres (PG) is much more standards compliant, but it's uglier and more complicated, especially from a UI perspective. It used to require manual vacuuming, and actually enforces referential integrity (which is a great thing that can be a pain in the ass). Autoincrement is much more flexible, but requires sequences (which can me masked by using serial), and wait, what's an OID?
So if you don't really know or care much about databases, data validity, ACID compliance, etc, but you do care about ease and speed, you tend to go with MySQL.
Too many (not all, but many) "web programmers" know a lot about "web 2.0" or PHP or Java, but don't know much about database theory or practice ("an index? what's that?"). They tend to see a database as just a fancy hashtable or bag of data, and indeed one that's not anywhere as dynamically changeable or forgiving as a hashtable.
For these folks, MySQL -- because until 5.0 it wasn't really an RDBMS, and in many ways still is not -- is a godsend. It's "faster" than the competition, and doesn't "waste time" on "esoteric" database stuff a web programmer doesn't want, understand, or see the value of.
For somebody with a database background, on the other hand, MySQL is a minefield: stuff that should work (complicated views, group bys, order bys in group bys) may work or may if you're lucky crash the server, or if you're unlucky just give results with incorrect data.
I've spent days working around some of these things in admittedly complicated by not extraordinarily complex views and group bys.
And MySQL isn't really faster. If you're using InnoDb tables for ACID (or just because at more than 30 Million rows, MyISAM tables tend to get crappy), yes a straight one-table select is probably faster than in PG. But add in joins, and PG is suddenly significantly faster. (MySQL is especially bad at outer joins.)
In summary: if to you the database is a bag, if you never intend to do data mining or reporting, if you're mostly interested in serving up big hunks of text with few relations or updates -- that is, if you're using a database to power a blog, MySQL is a great choice.
But if you're actually managing data, if you understand that data lives longer and is more valuable to a business than front-end programs and middle-tier business rules , if you need the features of a real database, use PG.
A "web programmer" who has decided all his table structures can be auto-generated by Hibernate (or some other ORM) looks at that and says, "too complicated" and "I bet complicated means more cost and slower speed" and so he goes with MySQL.
As I said, PG is far superior, and I hate mucking with MySQL's bizarre bugs, and I think that overall PG performance is probably better than MySQL for any even slightly complicated query.
But MySQL makes things look (deceptively) simple, so you get a lot of people who don't really understand database design figuring that MySQL is a great choice.
Use PG. It's consistent, it's reliable, it's standards-compliant, it's faster on (even moderately) complicated queries, it doesn't completely throw off your schedule with weird bugs.
I think PostgreSQL is a very viable alternative to MySQL. It's much more Oracle-like.
Personally, I try to avoid MySQL whenever I can for the following reasons:
The default storage engine MyISAM lacks Foreign Key support. Innodb does, however it does not support Foreign Keys to MyISAM tables for obvious reasons.
If I attempt to insert invalid data, MySQL will happily change it for me.
Illegal DateTime, Date, or Timestamp values are convert to "zero": http://dev.mysql.com/doc/refman/5.1/en/datetime.html
Varchar and Char type columns: http://dev.mysql.com/doc/refman/5.1/en/char.html
Numeric Datatypes depending on SQL Strict Mode: http://dev.mysql.com/doc/refman/5.1/en/numeric-types.html
These things may not be of any importance to some people, but when it comes to quality of data, I would rather use something else.
Well, there are may differences between the RDBMS of the world.
Take a look at http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems#Fundamental_features
Using this as a guide you should be able to narrow down your choices a lot.
A few things to keep in mind:
SQL Server 2005 Express is limited to a 4GB file size, but has excellent support int the .NET and Java languages.
MySQL will run on both Windows and Linux, and many languages support it (including .NET and Java) with external libraries.
SQLite is supported by effectively every operating system, and can be distributed as an integrated part of your application.
Firebird the open sourced and branched version of Borland's Interbase is pretty good, it works happily on most (all?) flavours of Linux and is very performant. I am not a RoR guy so I don't know the details, but I do have a friend in NZ who uses Firebird with all his RoR projects, it definitely works with RoR, and works well.
Edit: Found a link to a Firebird Rails Adapter here
Please be a little wary of the idealistic points-of-view put forth by people who may have axes to grind. MySQL is a capable, robust and scalable solution for many problems, but this is usually not so right out-of-the-box as its defaults are extremely conservative. Like any database product, you will need to spend time tuning its installation for the desired performance. You will also need to spend some time accomodating its limitations, which is also true for any database you choose.
MySQL's popularity is partly self-generated: a lot of hosting providers provide it, so a lot of people use it.
Mysql is great, and mssql is great. I haven't used anything else. I would say if you are completely on the fence, go with the technology stack you are strongest with. I have a good amount of c#, asp.net, and other Microsoft stack experience, so it is pretty natural for me to specialize in mssql. If you are more familiar with *nix, php, etc, you may be more at home sticking to an open source stack. You can certainly mix and match the two stacks, but sticking to one world or the other can avoid some pain for you.
Like duffymo I also recommend PostgreSQL. It is very nice from a developer perspective:
work on many platforms (both unix-based and Windows), has vary stable interfaces to any laguage/environment I worked (Windows, Linux, Delphi, Java, Perl, Python). Stored procedure language: PLPGSQL is also easy and powerful. User support (newsgroups, lists, SO) is nice and helpful.
"It would seem these days that everyone just goes with MySQL because that's just what everyone goes with." If MySQL is the only thing that people use then why are Oracle and MSSQL still around?
The debate as to which database engine to go for can be talked about until the cows come home. I personally have always found one constant in choosing a databse engine. The one you can afford is generally the one you go for.
If you can justify spending XXX on a database then you probably know the reasons already for choosing it.
MySQL is easy to install and it works fine without special settings. With the proper approach MySQL can flexibly adjust to your needs. But there are also some pitfalls: in some cases it may slow down your project, no matter how well you have tuned the DBMS and the data structure.
MySQL is for you if:
you do not want to delve into DBMS settings;
you think structurally;
integration with MySQL is in any programming language, framework, CMS, CMF and so on;
you need DBMS to manage small structural data (up to 1 or 2 gigabytes).
You mentioned, that you deal with large quantity of incoming data. Probably this guide would be helpful

Can you recommend a good source for Teradata Best Practices?

Looks like my data warehouse project is moving to Teradata next year (from SQL Server 2005).
I'm looking for resources about best practices on Teradata - from limitations of its SQL dialect to idioms and conventions for getting queries to perform well - particularly if they highlight things which are significantly different from SQL Server 2005. Specifically tips similar to those found in The Art of SQL (which is more Oracle-focused).
My business processes are currently in T-SQL stored procedures and rely fairly heavily on SQL Server 2005 features like PIVOT, UNPIVOT, and Common Table Expressions to produce about 27m rows of output a month from a 4TB data warehouse.
One place to start is here: http://www.teradataforum.com/
This might be a little late, but there are a few things which I can warn you about Teradata which I have learned.
Use the most recent version as often as possible.
For V12 the optimizer was re-written and the database performs much better now.
Try to realize that SQL Server and Teradata are very different beasts, most of the concepts will not transition well.
Do not underestimate the importance of a primary index.
The locks that teradata uses are very primitive when compared to other databases.
Do NOT use TERA mode. You do not have any code which is legacy, ANSI mode is far superior and is widely encouraged.
Join indexes are very helpful tools, but they do not provide all the answers.
Parallelism, take the time to understand how FASTLOAD, MULTILOAD, and TPUMP works and find out how one can leverage it with their ETL strategy.
If you are attempting to run a query which needs to be performant, do not use any casts, the optimizer will not use statistics to generate the best execution plan.
Working with dates are going to be a pain, just a warning.
Teradata is very DDL oriented, try to understand all the syntax related when creating a table.
Compression is a wonderful tool, if you have any values which are repeated in a table, make use of it.
There are not many tools available with Teradata, be prepared to build a lot. The tools that exist are very expensive.
Unfortunately, I do not know much about SQL Server, so I cannot say what tools in SQL Server appear in Teradata.
Hope this helps
I would also look into the recently launched Teradata Developer Exchange as well as the TeradataForum and forums on Teradata's main website.
I don't know of any good references available online. Teradata has some design manuals that are available for download, but they're more instruction manuals and not "best practices" as such. check them out here: http://www.info.teradata.com/DataWarehouse/eTeradata-BrowseBy.cfm?page=Teradata%20Database
Alternatively, you need to find a friendly Teradata expert to bounce ideas off. Try Teradata themselves, or find a local consultant with Teradata experience.
Best Practices on Teradata isn't a topic that gets lots of discussions and most of the best tricks tend to be proprietary knowledge of the person/people who discovered them.
Sorry,
David Stewardson
Satyam Computer Services
Top of the list on a Google search for "Teradata Best Practices" gave me TERADATA ADVISORY GROUP SETS BEST PRACTICES FOR BUSINESS OBJECTS AND TERADATA CUSTOMERS
EDIT: Seeing as that's just advertising, as you've pointed out, see how you go with these. Please bear in mind that I don't have a clue what Teradata is and can't see myself using it any time this side of the 22nd century AD.
Teradata Discussion Forums
Best Practices for Teradata Deployments
Best Study Guides For NCR Teradata Certifications
The middle one looks promising with it's nice long link tree at the top
Oracle® Business Intelligence Applications Installation and Configuration Guide > Preinstallation and Predeployment Considerations for Oracle BI Applications > Teradata-Specific Database Guidelines for Oracle Business Analytics Warehouse >
and the first link, to the forums, should put you in touch with the right people.