Reasons for SQL differences - sql

Why are SQL distributions so non-standard despite an ANSI standard existing for SQL? Are there really that many meaningful differences in the way SQL databases work or is it just the two databases with which I have been working: MS-SQL and PostgreSQL? Why do these differences arise?

The ANSI standard specifies only a limited set of commands and data types. Once you go beyond those, the implementors are on their own. And some very important concepts aren't specified at all, such as auto-incrementing columns. SQLite just picks the first non-null integer, MySQL requires AUTO INCREMENT, PostgreSQL uses sequences, etc. It's a mess, and that's only among the OSS databases! Try getting Oracle, Microsoft, and IBM to collectively decide on a tricky bit of functionality.

It's a form of "Stealth lock-in". Joel goes into great detail here:
http://www.joelonsoftware.com/articles/fog0000000056.html
http://www.joelonsoftware.com/articles/fog0000000052.html
Companies end up tying their business functionality to non-standard or weird unsupported functionality in their implementation, this restricts their ability to move away from their vendor to a competitor.
On the other hand, it's pretty short-sighted because anyone with half a brain will tend to abstract away the proprietary pieces, or avoid the lock-in altogether, if it gets too egregious.

First, I don't find databases to be as, say, browsers or operating systems in terms of incompatibility. Anyone with a few hours of training can start doing selects, inserts, deletes and updates on any SQL database. Meanwhile, it's difficult to write HTML that renders identically on every browser or write system code for more than one OS. Generally, differences in SQL are related to performance or fairly esoteric features. The major exception seems to be date formats and functions.
Second, database developers generally are motivated to add features that differentiate their product from everyone else. Products like Oracle, MS SQL Server and MySQL are vast ecosystems that rarely cross-pollinate in practice. At my workplace, we use Oracle and MySQL, but we could probably switch over to 100% Oracle in about a day if needed or desired. So I care a lot about the shiny toys Oracle gives us with each release, but I don't even know what version of MySQL we are using. IBM, Microsoft, PostgreSQL and the rest might as well not exist as far as we are concerned. Having the features to get and keep customers and users is far more important than compatibility in the database world. (That's the positive spin on the "lock-in" answer, I suppose.)
Third, there are legitimate reasons for different companies to implement SQL differently. For instance, Oracle has a multi-versioning system that allows very fast and scalable consistent reads. Other databases lack that feature, but usually are faster inserting rows and rolling back transactions. This is a fundamental difference in these systems. It doesn't make one better than the other (at least in the general case), just different. One should not be surprised if the SQL ontop of a database engine takes advantage of its strengths and attempts to minimize its weaknesses. In fact, it would be irresponsible of the developers to not do this.

John: The standard actually covers lots of subjects, including identity columns, sequences, triggers, routines, upsert, etc. But of course, many of these standards-components may have been brought in place later than the first implementations; and this could be a reason why SQL standards compliance is somewhat low, generally.
Neall: There are actually areas where the SQL standard is ahead of the implementations. For example, it would be nice to have CREATE ASSERTION, but as far as I know, no DBMS implements assertions yet.
Personally, I believe that the closed nature of some ISO standards (like the SQL standard) is part of the problem: When a standard is not readily available online, it's less likely to be known by implementors/planners, and too few customers ask for compliance because they don't know what to ask for.

It's certainly effective lock-in, as 1800 says. But in fairness to the database vendors, the SQL standard is always playing catch-up to current databases' feature sets. Most databases we have today are of pretty ancient lineages. If you trace Microsoft SQL Server back to its roots, I think you'll find Ingres - one of the very first relational databases written in the '70s. And Postgres was originally written by some of the same people in the '80s as a successor to Ingres. Oracle goes way back, and I'm not sure where MySQL came in.
Database non-portability does suck, but it could be a lot worse.

Related

What are good alternatives to SQL (the language)? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I occasionally hear things about how SQL sucks and it's not a good language, but I never really hear much about alternatives to it. So, are other good languages that serve the same purpose (database access) and what makes them better than SQL? Are there any good databases that use this alternative language?
EDIT:
I'm familiar with SQL and use it all the time. I don't have a problem with it, I'm just interested in any alternatives that might exist, and why people like them better.
I'm also not looking for alternative kinds of databases (the NoSQL movement), just different ways of accessing databases.
I certainly agree that SQL's syntax is difficult to work with, both from the standpoint of automatically generating it, and from the standpoint of parsing it, and it's not the style of language we would write today if we were designing SQL for the demands we place on it today. I don't think we'd find so many varied keywords if we designed the language today, I suspect join syntax would be different, functions like GROUP_CONCAT would have more regular syntax rather than sticking more keywords in the middle of the parentheses to control its behavior... create your own laundry list of inconsistencies and redundancies in SQL that you'd like/expect to see smoothed out if we redesigned the language today.
There aren't any alternatives to SQL for speaking to relational databases (i.e. SQL as a protocol), but there are many alternatives to writing SQL in your applications. These alternatives have been implemented in the form of frontends for working with relational databases. Some examples of a frontend include:
SchemeQL and CLSQL, which are probably the most flexible, owing to their Lisp heritage, but they also look like a lot more like SQL than other frontends.
LINQ (in .Net)
ScalaQL and ScalaQuery (in Scala)
SqlStatement, ActiveRecord and many others in Ruby,
HaskellDB
...the list goes on for many other languages.
I think that the underlying theme today is that rather than replace SQL with one new query language, we are instead creating language-specific frontends to hide the SQL in our regular every-day programming languages, and treating SQL as the protocol for talking to relational databases.
Take a look at this list.
Hibernate Query Language is probably the most common. The advantage of Hibernate is that objects map very easily (nearly automatically) to the relational database, and the developer doesn't have to spend much time doing database design. Check out the Hibernate website for more info. I'm sure others will chime in with other interesting query languages...
Of course, there's plenty of NoSQL stuff out there, but you specifically mention that you're not interested in those.
"I occasionally hear things about how SQL sucks and it's not a good language"
SQL is over thirty years old. Insights about "which features make something a 'good' language and which ones make it a 'bad' one" have evolved more rapidly than SQL itself.
Also, SQL is not a language that conforms to current standards of "what it takes to be relational", so, SQL just isn't a relational language to boot.
"but I never really hear much about alternatives to it."
I invite you to ponder the possibility that you are trying to hear only in the wrong places (that is, the commercial DBMS industry exclusively).
"So, are other good languages that serve the same purpose (database access) and what makes them better than SQL?"
Date&Darwen describe the features that a modern data manipulation language must conform to in their "Third Manifesto", the most recent version of which is laid down in their book "Databases, Types & the Relational Model".
"Are there any good databases that use this alternative language?"
If by "good", you mean something like "industrial-strength", then no. The closest thing available would probably be Dataphor.
The Rel project offers an implementation for the Tutorial D language defined in "Databases, Types & The Relational Model", but the current prime goal of Rel is to be educational in nature.
My SIRA_PRISE project offers an implementation for "truly relational" data management, but I hesitate to also label it "an implementation of a language".
And of course, you might also look into some non-relational stuff, as some have proposed, but I personally dismiss non-relational data management as multiple decades of technological regression. Not worth considering, that is.
Oh, and by the way, a software system that is used to manage databases is not "a database", but "a DataBase Management System", "DBMS" for short. Just like a photograph is not the same thing as a camera, and if you are discussing cameras, and you want to avoid confusion, then you should be using the proper word "cameras" instead of "photograph".
Perhaps you're thinking of the criticism C. Date and his friends have uttered against existing relational databases and SQL; they say the systems and language aren't 100% relational, and should be. I don't really see any real problem here; as far as I can see you can have a 100% relational system, if you want, just by disciplining the way in which you use SQL.
What I personally keep running into is the lack of expressive power SQL inherits from its theoretical basis, relational algebra. One issue is the lack of support for the use of domain ordering, which you run into when you work with data marked by dates, timestamps, etcetera. I once tried to do a reporting application entirely in plain SQL on a database full of timestamps and it just wasn't feasible. Another is the lack of support for path traversal: most of my data look like directed graphs that I need to traverse paths in, and SQL can't do it. (It lacks "transitive closure". SQL-1999 can do it with "recursive subqueries" but I haven't seen them in actual use yet. There are also various hacks to make SQL cope but they're ugly.) These problems are also discussed by some of Date's writings, by the way.
Recently I was pointed at .QL which appears to address the transitive closure issue nicely, but I don't know whether it can resolve the issue with ordered domains.
Take a look at LINQ to SQL...
Tried it out a couple months ago and never looked back....
Direct answer: I don't think there's any serious contender out there. DBase and its imitators (Foxpro, Codebase etc) was a contender for a while, but I think they basically lost the database query language war. There have been many other database products that had their own query language, like Progress and Paradox and several others I've used whose names I don't remember and surely many more that I never heard of. But I don't think any other contender even came close to getting a non-trivial share of the market.
As simple proof that there is a difference between a database format and a query language, the last version of DBase I used -- many years ago now -- offerred both the "traditional" DBase query language and SQL, both of which could be used to access the same data.
Side ramble: I wouldn't say that SQL sucks, but it has many flaws. With the benefit of the years of experience and hindsight we now have, I'm sure one could design a better query language. But creating a better query language, and convincing people to use it, are two very different things. Would it be enough better to convince people that it was worth the trouble of learning. People have invested many years of their lives learning to use SQL effectively. Even if your new language is easier to use, there would surely be a learning curve. And how would you migrate your existing systems from SQL to the new language? Etc. It can be done, of course, just like C++, C#, and Java have largely overthrown COBOL and FORTRAN. But it takes a combination of technical superiority and good marketing to pull it off.
Still, I get a chuckle out of people who rush forward to defend SQL anytime someone criticizes it, who insist that any problem you have with SQL must be your own ineptitude in using it and not any fault of SQL, that you must just not have reached the higher plane of thingking necessary to comprehend its perfection, etc. Calm down, take a deep breath: We are insulting a computer language, not your mother.
Back in the 1980's, ObjectStore provided transparent object access. It was kind of like an RDBMS plus an ORM, except without all those extra leaky abstraction layers: it stored objects directly in the database.
So this alternative was really "no language at all", or perhaps "the language you're already using". You'd write C++ code and create or traverse objects as if they were native objects, and the database took care of everything as needed. Kind of like ActiveRecord but it actually worked as well as the ActiveRecord marketing blitzes claim. :-)
(Of course, it didn't have Oracle's marketing muscle, and it didn't have MySQL's zero-cost, so everybody ignored it. And now we try to replicate that with RDBMSs and ORMs, and some people try to argue that tables actually make sense for storing objects, and that writing giant XML file to tell your computer how to map objects to tables is somehow a reasonable solution.)
I think you might be interested in looking at Dataphor, which is an open-source relational development environment with its own database server (which speaks D), and the ability to derive user interfaces from its query language.
Also, it appears Ingres still supports QUEL, and it's open source.
The general movement these days is NoSQL; generally these technologies are:
Distributed "hashtables" that store data as key/value pairs
Document-oriented databases
Personally I think there is nothing wrong with SQL as long as it fits your needs. SQL is expressive and great for working with structured data.
SQL works fine for the domain for which it was designed — interrelated tables of data. This is generally found in traditional business data processing. SQL doesn't work that well when trying to persist a complex network of objects.
If your needs are to store and process relatively traditional data, use some SQL-based DBMS.
In response to your edit:
If you're looking for alternatives to the SQL DML for retrieving data from relational data stores, I've never heard of any serious alternative to SQL.
The knocks SQL gets are not, I think, so much against the language as opposed to the underlying data storage principles on which the language is based. People often confuse the language SQL with the relational data model on which RDBMSes are built.
Relational Databases are not the only kind of databases around. I should say a word about Object-Databases as I havn't seen it in responses from others. I had some experience with the Zope python framework that use ZODB for objects persistency instead of RDBMS (well, it's theoretically possible to replace ZODB by another database within zope but the last time I checked I didn't succeed to have it working, so can't be positive about that).
The ZODB mindset is really different, more like object programming that would happen to be persistent.
ORM can be seen as a kind of language
In a way I believe the Object-database model is what ORM are about : accessing persistent data through your usual object model. It's a kind of language and it's gaining some market share, but for now we don't see it as a language but as an abstraction layer. However I believe it would be much more efficient to use an ORM over an Object-database than over SQL (in other words performance of ORMs I happened to use using some SQL database as base layers sucked).
There are many implementations of SQL (SQL Server, mysql, Oracle, etc.), but there is no other language that serves the same purpose in the sense of being a general purpose language designed for relational data storage and retrieval.
There are object databases such as db4o, and there are similar so-called noSQL databases that refer to just about any data storage mechanism that doesn't rely on SQL, but most commonly open-source products like Cassandra based loosely on Google's Bigtable concept.
There are also a number of special-purpose database products like CDF, but you probably don't need to worry about those - if you need one, you'll know.
None of these are equivalent to SQL.
That doesn't mean they're "better" or "worse" - they're just not the same. Dennis Forbes wrote a great post recently breaking down a number of the strange claims surfacing against SQL. He maintains (and I agree) that these complaints originate largely from people and shops who have either picked the wrong tool for the job in the first place, or aren't using their SQL DBMS properly (I'm not even surprised anymore when I see another SQL database where every column is a varchar(50) and there's not a single index or key, anywhere).
If you are implementing yet another social networking site and aren't too concerned with ACID principles, by all means start looking into products such as db4o. If you are developing a mission-critical business system, however, I highly highly recommend that you think twice before joining the "SQL sucks" chorus. Do the research first, find out what features the various products can and cannot support.
Edit - I was busy writing my answer and didn't get the question update from a few minutes. Having said that, SQL is essentially inseparable from the DBMS itself. If you run a SQL database product, then you access it with SQL, period.
Perhaps you are looking for abstractions over the syntax; Linq to SQL, Entity Framework, Hibernate/NHibernate, SubSonic, and a host of other ORM tools all provide their own SQL-like syntax that is not quite SQL. All of these "compile down" to SQL. If you run SQL Server, then you can also write CLR Functions/Procedures/Triggers, which allows you to write code in any .NET language that will run inside the database; however, this isn't really a substitute for SQL, more of an extension to it.
I'm not aware of any full "language" that you can layer on top of a SQL database; short of switching to a different database product, you're eventually going to see SQL on the pipe.
SQL is de-facto.
Frameworks that try to shield developers from it have eventually created their own specific language (Hibernate HQL comes to mind).
SQL solves a problem fairly well. It is no more difficult to learn than a high level programming language. If you already know a functional language then it is a breeze to grasp SQL.
Considering the leading database vendors providing state of the art databases (Oracle and SQL Server) support SQL and have invested years into optimization engines, etc. and all leading data modelling software and change management software deals in SQL, I'd say it is the safest bet.
Also, there is more to a database than just queries. There is scalability, backup and recovery, data mining. The big vendors support a lot of things that even the new "cache" engines don't even consider.
Problems with SQL have motivated me to cook up a draft query language called SMEQL over at the Portland Pattern Repository wiki. Comments Welcome. It borrows ideas from functional programming and IBM's experimental Business System 12 language. (I originally called it TQL, but found later that name was taken.)
Within the .NET world, while it still has a SQL-esque feel to it, LINQ-to-SQL will allow you to have a good mix of SQL and in-memory .NET processing of your data. It also simplifies a lot of the lower-level data plumbing that nobody really wants to do.
If you want to see a database type of a completely different mindset, take a look at CouchDB. "Better" is obviously a relative requirement and this sort of non-relation database is "Better" but only in certain scenarios.
SQL the language is very powerful, and relational database management systems have been and still are a huge success. But there is a class of application that requires very high scalability and availability, but not necessarily a high degree of data consistency (eventual consistency is what matters). A variety of systems get better performance and scaling than an RDBMS by relaxing the need for full ACID compliant transactions. These have been named "NoSQL", but as others point out, this is a misnomer: that perhaps they should be called NoACID databases.
Michael Stonebraker covers this in The "NoSQL" Discussion has Nothing to Do With SQL.

What nosql means? can someone explain it to me in simple words?

in this post Stack Overflow Architecture i read about something called nosql, i didn't understand what it means, and i tried to search on google but seams that i can't get exactly whats it.
Can anyone explain what nosql means in simple words?
If you've ever worked with a database, you've probably worked with a relational database. Examples would be an Access database, SQL Server, or MySQL. When you think about tables in these kinds of databases, you generally think of a grid, like in Excel. You have to name each column of your database table, and you have to specify whether all the values in that column are integers, strings, etc. Finally, when you want to look up information in that table, you have to use a language called SQL.
A new trend is forming around non-relational databases, that is, databases that do not fall into a neat grid. You don't have to specify which things are integers and strings and booleans, etc. These types of databases are more flexible, but they don't use SQL, because they are not structured that way.
Put simply, that is why they are "NoSQL" databases.
The advantage of using a NoSQL database is that you don't have to know exactly what your data will look like ahead of time. Perhaps you have a Contacts table, but you don't know what kind of information you'll want to store about each contact. In a relational database, you need to make columns like "Name" and "Address". If you find out later on that you need a phone number, you have to add a column for that. There's no need for this kind of planning/structuring in a NoSQL database. There are also potential scaling advantages, but that is a bit controversial, so I won't make any claims there.
Disadvantages of NoSQL databases is really the lack of SQL. SQL is simple and ubiquitous. SQL allows you to slice and dice your data easier to get aggregate results, whereas it's a bit more complicated in NoSQL databases (you'll probably use things like MapReduce, for which there is a bit of a learning curve).
From the NoSQL Homepage
NoSQL is a fast, portable, relational database management system without arbitrary limits, (other than memory and processor speed) that runs under, and interacts with, the UNIX 1 Operating System. It uses the "Operator-Stream Paradigm" described in "Unix Review", March, 1991, page 24, entitled "A 4GL Language". There are a number of "operators" that each perform a unique function on the data. The "stream" is supplied by the UNIX Input/Output redirection mechanism. Therefore each operator processes some data and then passes it along to the next operator via the UNIX pipe function. This is very efficient as UNIX pipes are implemented in memory. NoSQL is compliant with the "Relational Model".
I would also see this answer on Stackoverflow.
Put simply, it means not using a relational database for data storage.
Here's a relevant article: http://www.computerworld.com/s/article/9135086/No_to_SQL_Anti_database_movement_gains_steam_
NoSql is the new database philosophy which talks about all the shortcomings of the relational database design, particularly the problems they have in scaling up for today's demanding web environments.
NoSql is quickly evolving into a movement with new tools, software and formats coming up as alternative to SQL.
RDBMS is as ubiquitous as OOP and while both of these design methodologies solve some problems wonderfully, they don't solve all.
So think of NoSql as the functional programmin of the database world.
Was this simple enough?
NoSQL is the idea that SQL-type databases don't satisfy the demands/requirements of a heavily-used database that requires transactions be reliable and failsafe (or close to it). This ties into the ideas of ACID and CAP, both things worth looking into but not something to lose sleep over unless you run a really popular site that is transaction-heavy (ie Amazon or Ebay). To get a great start on these subjects, I suggest:
http://www.eflorenzano.com/blog/post/my-thoughts-nosql/
and
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
Something everyone considering a "nosql" approach should consider:
(I shan't risk putting the image into this post as it contains a curse word, and I don't want offensive flags. So clicker beware -- there's an f-word in there. Only click if you have a sense of humor.)
http://browsertoolkit.com/fault-tolerance.png
Found this nice article about no-sql
and this as well:
NoSQL, Yes Search

Choosing the right database: MySQL vs. Everything else

It would seem these days that everyone just goes with MySQL because that's just what everyone goes with. I'm working on a web application that will be handling a large quantity of incoming data and am wondering if I should "just go with MySQL" or if I should take a look at other open-source databases or even commercial databases?
EDIT: Should have mentioned, am looking for optimal performance, integration with ruby + rails running on debian 5 and money is tight although if it will save money in the long run I would consider making an investment into something more expensive.
I've posted this before, but I have no reason to change this advice:
MySQL is easier to start using.
Nicer UI tools. Faster, if you don't use ACID. More tolerant of invalid data. Autoincrement columns are as easy as typing autoincrement. Permissions aren't as tied to the file systems and OS users. Setting a delimiter is easier than using PG's "dollar sign quoting" when writing a stored proc. In MySQL, you connect to all databases, not just one at a time.
Postgres (PG) is much more standards compliant, but it's uglier and more complicated, especially from a UI perspective. It used to require manual vacuuming, and actually enforces referential integrity (which is a great thing that can be a pain in the ass). Autoincrement is much more flexible, but requires sequences (which can me masked by using serial), and wait, what's an OID?
So if you don't really know or care much about databases, data validity, ACID compliance, etc, but you do care about ease and speed, you tend to go with MySQL.
Too many (not all, but many) "web programmers" know a lot about "web 2.0" or PHP or Java, but don't know much about database theory or practice ("an index? what's that?"). They tend to see a database as just a fancy hashtable or bag of data, and indeed one that's not anywhere as dynamically changeable or forgiving as a hashtable.
For these folks, MySQL -- because until 5.0 it wasn't really an RDBMS, and in many ways still is not -- is a godsend. It's "faster" than the competition, and doesn't "waste time" on "esoteric" database stuff a web programmer doesn't want, understand, or see the value of.
For somebody with a database background, on the other hand, MySQL is a minefield: stuff that should work (complicated views, group bys, order bys in group bys) may work or may if you're lucky crash the server, or if you're unlucky just give results with incorrect data.
I've spent days working around some of these things in admittedly complicated by not extraordinarily complex views and group bys.
And MySQL isn't really faster. If you're using InnoDb tables for ACID (or just because at more than 30 Million rows, MyISAM tables tend to get crappy), yes a straight one-table select is probably faster than in PG. But add in joins, and PG is suddenly significantly faster. (MySQL is especially bad at outer joins.)
In summary: if to you the database is a bag, if you never intend to do data mining or reporting, if you're mostly interested in serving up big hunks of text with few relations or updates -- that is, if you're using a database to power a blog, MySQL is a great choice.
But if you're actually managing data, if you understand that data lives longer and is more valuable to a business than front-end programs and middle-tier business rules , if you need the features of a real database, use PG.
A "web programmer" who has decided all his table structures can be auto-generated by Hibernate (or some other ORM) looks at that and says, "too complicated" and "I bet complicated means more cost and slower speed" and so he goes with MySQL.
As I said, PG is far superior, and I hate mucking with MySQL's bizarre bugs, and I think that overall PG performance is probably better than MySQL for any even slightly complicated query.
But MySQL makes things look (deceptively) simple, so you get a lot of people who don't really understand database design figuring that MySQL is a great choice.
Use PG. It's consistent, it's reliable, it's standards-compliant, it's faster on (even moderately) complicated queries, it doesn't completely throw off your schedule with weird bugs.
I think PostgreSQL is a very viable alternative to MySQL. It's much more Oracle-like.
Personally, I try to avoid MySQL whenever I can for the following reasons:
The default storage engine MyISAM lacks Foreign Key support. Innodb does, however it does not support Foreign Keys to MyISAM tables for obvious reasons.
If I attempt to insert invalid data, MySQL will happily change it for me.
Illegal DateTime, Date, or Timestamp values are convert to "zero": http://dev.mysql.com/doc/refman/5.1/en/datetime.html
Varchar and Char type columns: http://dev.mysql.com/doc/refman/5.1/en/char.html
Numeric Datatypes depending on SQL Strict Mode: http://dev.mysql.com/doc/refman/5.1/en/numeric-types.html
These things may not be of any importance to some people, but when it comes to quality of data, I would rather use something else.
Well, there are may differences between the RDBMS of the world.
Take a look at http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems#Fundamental_features
Using this as a guide you should be able to narrow down your choices a lot.
A few things to keep in mind:
SQL Server 2005 Express is limited to a 4GB file size, but has excellent support int the .NET and Java languages.
MySQL will run on both Windows and Linux, and many languages support it (including .NET and Java) with external libraries.
SQLite is supported by effectively every operating system, and can be distributed as an integrated part of your application.
Firebird the open sourced and branched version of Borland's Interbase is pretty good, it works happily on most (all?) flavours of Linux and is very performant. I am not a RoR guy so I don't know the details, but I do have a friend in NZ who uses Firebird with all his RoR projects, it definitely works with RoR, and works well.
Edit: Found a link to a Firebird Rails Adapter here
Please be a little wary of the idealistic points-of-view put forth by people who may have axes to grind. MySQL is a capable, robust and scalable solution for many problems, but this is usually not so right out-of-the-box as its defaults are extremely conservative. Like any database product, you will need to spend time tuning its installation for the desired performance. You will also need to spend some time accomodating its limitations, which is also true for any database you choose.
MySQL's popularity is partly self-generated: a lot of hosting providers provide it, so a lot of people use it.
Mysql is great, and mssql is great. I haven't used anything else. I would say if you are completely on the fence, go with the technology stack you are strongest with. I have a good amount of c#, asp.net, and other Microsoft stack experience, so it is pretty natural for me to specialize in mssql. If you are more familiar with *nix, php, etc, you may be more at home sticking to an open source stack. You can certainly mix and match the two stacks, but sticking to one world or the other can avoid some pain for you.
Like duffymo I also recommend PostgreSQL. It is very nice from a developer perspective:
work on many platforms (both unix-based and Windows), has vary stable interfaces to any laguage/environment I worked (Windows, Linux, Delphi, Java, Perl, Python). Stored procedure language: PLPGSQL is also easy and powerful. User support (newsgroups, lists, SO) is nice and helpful.
"It would seem these days that everyone just goes with MySQL because that's just what everyone goes with." If MySQL is the only thing that people use then why are Oracle and MSSQL still around?
The debate as to which database engine to go for can be talked about until the cows come home. I personally have always found one constant in choosing a databse engine. The one you can afford is generally the one you go for.
If you can justify spending XXX on a database then you probably know the reasons already for choosing it.
MySQL is easy to install and it works fine without special settings. With the proper approach MySQL can flexibly adjust to your needs. But there are also some pitfalls: in some cases it may slow down your project, no matter how well you have tuned the DBMS and the data structure.
MySQL is for you if:
you do not want to delve into DBMS settings;
you think structurally;
integration with MySQL is in any programming language, framework, CMS, CMF and so on;
you need DBMS to manage small structural data (up to 1 or 2 gigabytes).
You mentioned, that you deal with large quantity of incoming data. Probably this guide would be helpful

How Important is SQL Portability?

It seems to me, from both personal experience and SO questions and answers, that SQL implementations vary substantially. One of the first issues for SQL questions is: What dbms are you using?
In most cases with SQL there are several ways to structure a given query, even using the same dialect. But I find it interesting that the relative portability of various approaches is frequently not discussed, nor valued very highly when it is.
But even disregarding the likelihood that any given application may or not be subject to conversion, I'd think that we would prefer that our skills, habits, and patterns be as portable as possible.
In your work with SQL, how strongly do you prefer standard SQL syntax? How actively do you eschew propriety variations? Please answer without reference to proprietary preferences for the purpose of perceived better performance, which most would concede is usually a sufficiently legitimate defense.
I vote against standard/vendor independent sql
Only seldom the database is actually switched.
There is no single database that fully conforms to the current sql standard. So even when you are standard conform, you are not vendor independent.
vendor differences go beyond sql syntax. Locking behaviour is different. Isolation levels are different.
database testing is pretty tough and under developed. No need to make it even harder by throwing multiple vendors in the game, if you don't absolutly need it.
there is a lot of power in the vendor specific tweaks. (think 'limit', or 'analytic functions', or 'hints' )
So the quintessence:
- If there is no requirement for vendor independence, get specialised for the vendor you are actually using.
- If there is a requirement for vendor independence make sure that who ever pays the bill, that this will cost money. Make sure you have every single rdbms available for testing. And use it too
- Put every piece of sql in a special layer, which is pluggable, so you can use the power of the database AND work with different vendors
- Only where the difference is a pure question of syntax go with the standard, e.g. using the oracle notation for (outer) joins vs the ANSI standard syntax.
We take it very seriously at our shop. We do not allow non-standard SQL or extensions unless they're supported on ALL of the major platforms. Even then, they're flagged within the code as non-standard and justifications are necessary.
It is not up to the application developer to make their queries run fast, we have a clear separation of duties. The query is to be optimized only by the DBMS itself or the DBAs tuning of the DBMS.
Real databases, like DB2/z :-), process standard SQL plenty fast.
The reason we enforce this is to give the customer choice. They don't like the idea of being locked into a specific vendor any more than we do.
In my experience, query portability turns out to be not so important. We work with various data sources (mainly MSSQL and MySQL), but we know which data is stored where and can optimize accordingly. Since we control the systems, we decide when - if ever - structures are moved and queries need to be rewritten.
I also like to use certain other server-specific functionality, such as query notification in SQL Server, which MySQL doesn't offer. So there, again, we use it when we can and don't worry about portability.
Furthermore, parts of our apps need to query schema information and act on it. Here, again, we have server-specific code for the different systems, instead of trying to restrict ourselves to the lowest common denominator.
There is no clear answer whether SQL portability is desirable or not - it really depends a lot on the situation, such as the type of application.
If the application is going to be a service - ie there will only ever be you hosting it, then obviously nobody but you will care whether your SQL is portable enough, so you could safely ignore it as long as you have no specific plans to drop support for your current platform.
If the application is going to be installed at a number of sites, which each have their own established database systems, obviously SQL portability is very important to people. It lets you widen your potential market, and may give a bit of piece of mind to clients who are on the fence in regards to their database system. Whether you want to support that, or you are happy selling only to, for instance, Oracle customers, or only to MySQL/PostgreSQL customers, for example, is up to you and what you think your market is.
If you are coding in PHP, then the vast majority of your potential customers are probably going to expect MySQL. If so, then it's not a big deal to assume MySQL. Or similarly if you are in C#/.NET then you could assume Microsoft SQL Server. Again, however, there is a flip side because there may exist a small but less competitive market of PHP or .NET users who want to connect to other database systems than the usual.
So I would largely regard this as a market research question, unless as in my first example you are providing a hosted service where it doesn't matter to users, in which case it is for your own convenience only.

What relational database innovations have there been in the last 10 years

The SQL implementation of relational databases has been around in their current form for something like 25 years (since System R and Ingres). Even the main (loosely adhered to) standard is ANSI-92 (although there were later updates) is a good 15 years old.
What innovations can you think of with SQL based databases in the last ten years or so. I am specifically excluding OLAP, Columnar and other non-relational (or at least non SQL) innovations. I also want to exclude 'application server' type features and bundling (like reporting tools)
Although the basic approach has remained fairly static, I can think of:
Availability
Ability to handle larger sets of data
Ease of maintenance and configuration
Support for more advanced data types (blob, xml, unicode etc)
Any others that you can think of?
Hash joins
Cost-based optimizers (pretty much turned query-writing on its head)
Partitioning (enables much better VLDB management)
Parallel (multi-threaded) query processing
Clustering (not just availability but scalability too)
More flexibility in SQL as well as easier integration of SQL with 3GL languages
Better diagnostics capabilities
Analytic functions like RANK
I'm not sure if you want to include even vendor-specific innovations (and nor am I entirely certain that other database engines can't already do this), but SQL Server 2005 adds recursive transact-sql queries to their language. I find them amazingly useful for iterating over hierarchical data. I believe 2008 adds some new functionality related to hierarchical data, but I haven't looked that closely.
SELECT (invoiceprice * detailweight) / SUM(weight) OVER(PARITTION BY invoice) as weighted, *
FROM tblInvoiceDetails
Windowed functions are awesome for doing things like weighted averages, and other things that previously required CURSORS.
Well one could possibly suggest that a lack of movement for 15 years is not just a sign of lack of innovation, but a sign that databases are almost perfect! Many people try to do things in code that are better done in databases that have been refined since the 1960's to run as fast and as efficiently as possible.
I would say the last ten years (1998-2008) have seen open source RDBMS products become viable in mainstream deployments. Most Fortune 500 companies now use MySQL or PostgreSQL or another open source RDBMS somewhere in their organization, even if they also use one of the commercial, closed-source RDBMS brands.
This isn't a technical advancement, but it's noteworthy nevertheless because the availability of a stable, open-source RDBMS engine enables many other innovative projects.
I realize that both MySQL and PostgreSQL were available as early as 1995, but I would argue that they weren't mainstream for several years after that.
Along with your list of more advanced data types (blob, xml, unicode etc) you should include spatial types.
The PostGIS extension for PostgreSQL came out in 2001, but now all the major vendors have implemented spatial objects and spatial SQL.
Along with the rise of Google Maps, Bing Maps, and OpenLayers the ability to display geospatial data and run spatial queries without middleware has had a huge effect on the web and data analysis.
I think most of the progress has been in the realm of performance - query profilers and clusters.
I think that the area of biggest innovation has probably been in data replication - for availability and reliability. Most of the other areas are more incremental. By specifying a decade, you omit the ORDBMS stuff - extensibility; that appeared in 1997.