How to select an SQL database? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We're living in a golden age of databases, with numerous high quality commercial and free databases. This is great, but the downside is there's not a simple obvious choice for someone who needs a database for his next project.
What are the constraints/criteria you use for selecting a database?
How well do the various databases you've used meet those constraints/criteria?
What special features do the databases have?
Which databases do you feel comfortable recommending to others?
etc...

I would think first on what the system requirements are for data access, data security, scalability, performance, disconnected scenarios, data transformation, data sizing.
On the other side, consider also the experience and background of developers, operators, platform administrators.
You should also think on what constraints you have regarding programming languages, operating systems, memory footprint, network bandwidth, hardware.
Last, but not least, you have to think about business issues like budget for licences, support, operation.
After all those considerations you should end up with just a couple of options and the selection should be easier.
In other words, select the technology that suits the best the constraints and needs of your organization and project.
I certainly think that you are right on saying that it is not an obvious choice given the wide number of alternatives, but this is the only way I think you can narrow them to the ones that are really feasible for your project.

My selection criteria (mainly programming centric):
Maintenance: How are updates/hotfixes installed?
Transaction control: How it is implemented
Are Stored Procedures supported?
Can you use exception handling in Stored Procedures?
Costs
As a benefit: Can you use recursion on Stored Procedures? (E.g. in SQL Server 2000 the recursion stops after 32 passes IIRC)

For most people in a corporate environment the choice comes down to "the one we have".
Since you seem to be fortunate enough to have a choice, I'll take a quick run through the questions and maybe pose a few more at the end.
The biggest criterion may be cost. Do you want/are you prepared to pay for your DBMS platform? If not, then Oracle, MS SQL Server, Sybase and others are probably out, although if you're not building a commercial app then there may be some wiggle room. Also, platform - can you run the software on your hardware?
Other dimensions for consideration might include expected number of concurrent connections, transactional vs mostly reads, size, availability and I guess lots of others.
"Special features" are, in the main, to be avoided - in my cynical world-view they're intended to lock you into a platform. So something like Oracle's PL/SQL is a feature that, while powerful (and likely to mean the need for extra CPU power at more licensing cost) is not portable. If you expect extremely high volumes then partitioning may be useful, I suppose.
I have worked with Oracle, MS SQL Server, MySQL, PostreSQL, SQLite and Sybase that I can think of. I'd happily recommend all but Sybase, about which I have some concerns these days (I could easily be wrong, but personally I think the money could be better spent elsewhere) but not all for the same applications.
Ideally, I like to have the warm feeling that it doesn't really matter what DB platform I'm using because I can port easily. With a good abstraction layer between data and business logic, I should be able to develop locally against, say, the excellent SQLite and implement painlessly on, for example, Postgres. With something like ActiveRecord from Rails coupled with a little awareness of things like differences in reserved words, this is almost completely cost-free.

Surely the most compelling factor is the expertise of you or your team...or the pool of resource you are likely to hire in the future. I would tend to go with the grain most of the time, using MySQL in a LAMP team and SQL Server in a MS team, since either of these products is capable of doing everything necessary even in a high-load environment.
The benefits of any other database are going to be marginal compared to the pain of learning how to use it well. The only exception to this, in my opinion, would be in a high-demand environment where:
a. the obvious choice has been tried and is failing
b. the benefits of scaling multiply the marginal benefit to such a degree that it will be worth the cost of using something unexpected.
I would assume the need to hire at least two and preferably three excellent DBAs with long term familiarity with the new database.
And first I would try to hire them for the technology that was failing, because it is more likely to be the way it's used than the technology itself that is causing the problem.

The existing answers are great. It's worth bearing in mind that Oracle now has an XE version of it's 10g database which is available for free and comes with Application Express, a great web based development environment.
It is limited, 4GB HD, 1 GB Ram and uses only one CPU. This is enough to run smaller system though and can be upgraded easily at a later date if necessary. Oracle can be one of the toughest to learn but is also one of the best to have on your CV :-)
I think SQLServer from Microsoft also has a 'starter' type database. Don't discount the commercial products - if you are going to bet your company on a database technology I would rather be using a product from Oracle or Microsoft personally. Thats not to say there is anything wrong with Open Source.
Spend a while evaluating them :-)

Linux, Web Hosted - MySQL (PostreSQL maybe)
Mainstream SME - MS SQL
Big Iron (banking etc) - Oracle
Thinking about anything other than those three is masturbation - any of the other databases becomes a discussion about niche products to solve particular problems that you probably haven't encountered yet. If you choose anything other than the three above you will -
Struggle to find people to work on the project or keep the database going
Struggle to motivate your decision without an academic discussion
Someone will curse you, your ancestors and your lineage a few years down the line - and replace your choice anyway.
Niche databases are not where architectural strides are made - it is technologies like middleware, messaging, cloud services etc where you can afford to (and should) go out on a limb to find good products.

Related

NoSQL databases (MongoDB) versus relational databases (MySQL)

I've been spending a considerable amount of time learning MongoDB as part of the MEAN stack (I'm new to the MEAN stack), and I feel like I'm just barely starting to get the hang of it. Recently, however, my supervisor at work (who's an experienced programmer) suggested I learn relational databases (it could be that it's used by more companies), and I have taken his suggestion to heart since I'm only a junior developer, and would like to move up soon. My personal goal/project is to build a social networking site where one group of users have the ability to search for another group of users (and most likely charge a membership fee). Would the database matter for a project like this? I would love to kill two birds with one stone by learning something (and learning it really well) that could be used in my personal project, and in a professional job.
I'm certainly open to learning more than one language, but I need a starting point, so I need something that will help me accomplish my personal goal. Since I've spent a considerable amount of time with JavaScript (as oppose to, say, Python), having to learn relational databases concerns me. Are there any downsides to just sticking to MongoDB?
Thanks in advance!
The first thing you need is to assess what type of data you are going to deal with (will it be structured, easily represented by tables, or it will be more of the non-structured type).
If structured, RDBMS is the way to go.
Incidentally, as far as I know, the first versions of Facebook were implemented using MySQL (the type of DBMS your boss was referring to).
Farid, as you're a junior programmer, I'd suggest you to learn as much as possible of both approach.
The knowledge acquired on one of them will also help you getting the most from the other (as they both concern data, how to structure it and how to query it).
Also, in the day by day job (unless you specialize only in large sites where scalability is the first concern) you might encounter RDBMS more frequently than NoSQL choices.
Both technologies have pro and cons and, unless you know both sides (at least documenting yourself, if not by direct experience) you might go for one solution when the other could have been preferable.
Well, anyone have concern about learn SQL (and use JOIN), but:
One day, you don't know how, you will use it (or understand it), so perhaps is better take a moment for SQL (it's the most widespread)
MongoDB is good if you development a REST service, but if you want make a social network like project, perhaps is better look at Graph database.
Perhaps that video can help, is an overview of SQL vs NoSQL.

How did SQL become the dominant database language?

For most programming tasks, you've got quite the selection of languages to choose from, and good strong communities behind plenty of them. But when you need to work with a database, there's really only one viable choice these days: SQL. Sure, there are different companies with different implementations and dialects, but you're still looking things up with
SELECT columns
FROM table
JOIN other_table ON criteria
WHERE other_criteria
It wasn't always this way, though. As late as the early 90s, there was no single obvious way to interact with a database. But today, there is. And with the way computer languages tend to proliferate rather than converge, I find that a bit odd. What historical and technical factors led to SQL's almost complete dominance of the database access domain?
It's like this Winston Churchill quote:
Indeed, it has been said that democracy is the worst form of government except all those other forms that have been tried from time to time.
There were alternative database technologies before 1970 when the relational model was first proposed. There have been alternatives the whole time since then, and there are new alternatives today.
But of all the alternatives, no solution besides SQL provides as good a balance for:
Widespread standardization
Popular and long-lived products such as Oracle
Plays nicely with many application programming languages
Support for formal data modeling, strong data integrity, ACID transactions
Here's a reference from the Codd Wikipedia article - some detail on how SQL 'won out'.
Committee on Innovations in Computing and Communications: Lessons from History: The Rise of Relational Databases.
Edgar F Codd started the madness.
Enjoy!
Codd and Churchill aside, SQL isn't a horribly bad language for defining and querying table-based datasets. As another general said, "It got there the firstest with the mostest."
One factor is that data persists. It is a lot harder to replace/migrate a company's data than its applications. Applications can come and go, coded in the latest 'flavor of the month' language, but the database platform lives on. This is a bit like a QWERTY effect. While the QWERTY keyboard layout is known to be inefficient, it persists because there would a massive cost in switching to anything else.
Secondly, there is massive market domination by Oracle and IBM (and more recently Microsoft). While they might not agree on every detail, neither has seen a benefit to a non-SQL interface to their databases. I used Ingres back in the early 90s when its QUEL was being pushed out by SQL.
Thirdly, there's a benefit to the application developers (especially the likes of SAP and Oracle) to have a standard(ish) platform to sit on.
I suppose the flip side to this question is why do we need/want so many different programming languages.

How necessary or convenient is it to write portable SQL?

Time and again, I've seen people here and everywhere else advocating avoidance of nonportable extensions to the SQL language, this being the latest example. I recall only one article stating what I'm about to say, and I don't have that link anymore.
Have you actually benefited from writing portable SQL and dismissing your dialect's proprietary tools/syntax?
I've never seen a case of someone taking pains to build a complex application on mysql and then saying You know what would be just peachy? Let's switch to (PostGreSQL|Oracle|SQL Server)!
Common libraries in -say- PHP do abstract the intricacies of SQL, but at what cost? You end up unable to use efficient constructs and functions, for a presumed glimmer of portability you most likely will never use. This sounds like textbook YAGNI to me.
EDIT: Maybe the example I mentioned is too snarky, but I think the point remains: if you are planning a move from one DBMS to another, you are likely redesigning the app anyway, or you wouldn't be doing it at all.
Software vendors who deal with large enterprises may have no choice (indeed that's my world) - their customers may have policies of using only one database vendor's products. To miss out on major customers is commercially difficult.
When you work within an enterprise you may be able to benefit from the knowledge of the platform.
Generally speaking the DB layer should be well encapsulated, so even if you had to port to a new database the change should not be pervasive. I think it's reasonable to take a YAGNI approach to porting unless you have a specific requriement for immediate multi-vendor support. Make it work with your current target database, but structure the code carefully to enable future portability.
The problem with extensions is that you need to update them when you're updating the database system itself. Developers often think their code will last forever but most code will need to be rewritten within 5 to 10 years. Databases tend to survive longer than most applications since administrators are smart enough to not fix things that aren't broken so they often don't upgrade their systems with every new version.Still, it's a real pain when you upgrade your database to a newer version yet the extensions aren't compatible with that one and thus won't work. It makes the upgrade much more complex and demands more code to be rewritten.When you pick a database system, you're often stuck with that decision for years.When you pick a database and a few extensions, you're stuck with that decision for much, much longer!
The only case where I can see it necessary is when you are creating software the client will buy and use on their own systems. By far the majority of programming does not fall into this category. To refuse to use vendor specific code is to ensure that you have a porrly performing database as the vendor specific code is usually written to improve the performance of certain tasks over ANSII Standard SQL and it written to take advatage of the specific architecture of that database. I've worked with datbases for over 30 years and never yet have I seen a company change their backend database without a complete application rewrite as well. Avoiding vendor-specific code in this case means that you are harming your performance for no reason whatsoever most of the time.
I have also used a lot of different commercial products with database backends through the years. Without exception, every one of them was written to support multiple backends and, without exception, every one of them was a miserable, slow dog of a program to actually use on a daily basis.
In the vast majority of applications I would wager there is little to no benefit and even a negative effect of trying to write portable sql; however, in some cases there is a real use case. Let's assume you are building a Time Tracking Web Application. And you'd like to offer a self hosted solution.
In this case your clients will need to have a DB Server. You have some options here. You could force them into using a specific version which could limit your client base. If you can support multiple DBMS then you have a wider potential client that can use your web application.
If you're corporate, then you use the platform you are given
If you're a vendor, you have to plan for multiple platforms
Longevity for corporate:
You'll probably rewrite the client code before you migrate DBMS
The DBMS will probably outlive your client code (Java or c# against '80 mainframe)
Remember:
SQL within a platform is usually backward compatible, but client libraries are not. You are forced to migrate if the OS can not support an old library, or security environment, or driver architecture, or 16 bit library etc
So, assume you had an app on SQL Server 6.5. It still runs with a few tweaks on SQL Server 2008. I bet you're not using the sane client code...
There are always some benefits and some costs to using the "lowest common denominator" dialect of a language in order to safeguard portability. I think the dangers of lock-in to a particular DBMS are low, when compared to the similar dangers for programming languges, object and function libraries, report writers, and the like.
Here's what I would recommend as the primary way of safeguarding future portability. Make a logical model of the schema that includes tables, columns, constraints and domains. Make this as DBMS independent as you can, within the context of SQL databases. About the only thing that will be dialect dependent is the datatype and size for a few domains. Some older dialects lack domain support, but you should make your logical model in terms of domains anyway. The fact that two columns are drawn from the same domain, and don't just share a common datatype and size, is of crucial importance in logical modelling.
If you don't understand the distinction between logical modeling and physical modeling, learn it.
Make as much of the index structure portable as you can. While each DBMS has its own special index features, the relationship between indexes, tables, and columns is just about DBMS independent.
In terms of CRUD SQL processing within the application, use DBMS specific constructs whenever necessary, but try to keep them documented. As an example, I don't hesitate to use Oracle's "CONNECT BY" construct whenever I think it will do me some good. If your logical modeling has been DBMS independent, much of your CRUD SQL will also be DBMS independent even without much effort on your part.
When it comes time to move, expect some obstacles, but expect to overcome them in a systematic way.
(The word "you" in the above is to whom it may concern, and not to the OP in particular.)

What does the term legacy database mean? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 9 months ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I read this term a lot. What exactly is a legacy database? I ask because I had thought it meant an old database like dbase or rdb, but I don't think I'm right.
When looking at RoR or Django and "legacy database" integration, what does legacy database really mean? Is it different than a generic term "legacy database"?
In the general context, it can refer to any of the older database technologies.
In a more specific context, it can refer to a database system that was inherited by a team from previous project owners.
legacy: anything from the past that keeps coming around to haunt you.
A legacy database is generally something that you will have to inherit and base some of your design decisions around. Most companies that put out work may already have some other (usually horrible) solution and you need to give them a bigger and better product...
BUT
It has to work with all of their old legacy data. The company is not going to want to manage two different applications just so they can keep all their old records. You will need to develop your solution to be able to migrate the data from the legacy system over into your system. This can have a massive impact on the overall design of the new database, because it cannot stray too far from the previous without introducing a lot of problems in terms of data integrity.
It's usually derogatory in my experience:
Something no-one wants to touch in case it breaks
Databases that can't be maintained (say that SQL 6.5 box lying around)
Someone else's badly designed and implemented database
Something that someone is trying to replace
Supported by the 93 year old wierdo
If it's in-use but still has maintenance or development activities, it can't be legacy...
Edit:
Given the age of the SQL language and the RDBMS, everything is legacy (including my new system due next year) compared to the software listed. At what point does Ruby turn legacy from the database perspective..?
We mostly use the term 'legacy database' as a db schema we can not 'easily' modify without breaking other software/systems using this schema.
this sums it up pretty well.
[edit] Broken link. Here's the quote from FOLDOC:
Legacy System -- A computer system or application program which continues to be used because of the cost of replacing or redesigning it and often despite its poor competitiveness and compatibility with modern equivalents. The implication is that the system is large, monolithic and difficult to modify.
If legacy software only runs on antiquated hardware the cost of maintaining this may eventually outweigh the cost of replacing both the software and hardware unless some form of emulation or backward compatibility allows the software to run on new hardware.
Flat file, hierarchy, and network databases are usually referred as legacy databases. They represent the ways people used to organize information in prehistoric times — about 30 years ago.
Legacy is used to denote the old thing. legacy database is something which continues to be used because of it cost of replacing and redesigning it.
In general context refers to old code inherited. Tipycally cobol code.
It is used for code which it is still used for historcal reasons.
It applies also for DB schemas

Choosing ISAM rather than SQL

Many developers seem to be either intimidated or a bit overwhelmed when an application design requires both procedural code and a substantial database. In most cases, "database" means an RDBMS with an SQL interface.
Yet it seems to me that many of the techniques for addressing the "impedance mismatch" between the two paradigms would be much better suited to an ISAM (indexed-sequential access method) toolset, where you can (must) specify tables, indexes, row-naviagation, etc. overtly - exactly the behavior prescribed by the ActiveRecord model, for instance.
In early PC days, dBASE and its progeny were the dominant dbms platforms, and it was an enhanced ISAM. Foxpro continues this lineage quite successfully through to today. MySQL and Informix are two RDBMSs that were at least initially built on top of ISAM implementations, so this approach should be at least equally performant. I get the feeling that many developers who are unhappy with SQL are at least unconsciously yearning for the ISAM approach to be revived, and the database could be more easily viewed as a set of massively efficient linkable hyper-arrays. It seems to me that it could be a really good idea.
Have you ever tried, say, an ORM-to-ISAM implementation? How successfully? If not, do you think it might be worth a try? Are there any toolsets for this model explicitly?
Maybe Pig Latin is what you want? According to this article
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=693D79B5EFDC0452E1C9A87D1C495D4C?doi=10.1.1.124.5496&rep=rep1&type=pdf :
"Besides, many of the people who ana-
lyze this data are entrenched
procedural programmers, who find the
declarative, SQL style to be
unnatural. The success of the more
procedural map-reduce programming
model, and its associated scalable
implementations on commodity hard-
ware, is evidence of the above.
However, the map-reduce paradigm is
too low-level and rigid, and leads to
a great deal of custom user code that
is hard to maintain, and reuse. We
describe a new language called Pig
Latin that we have designed to fit in a
sweet spot between the declarative
style of SQL, and the low-level,
procedural style of map-reduce."
There are certainly times and places where ISAM provides the services needed by the application with less cost and overhead than a full-blown SQL DBMS. One downside of an ISAM mechanism is that there isn't necessarily a system catalogue to describe the data; another is that generally there are few user-friendly tools to get at the data. These are both places where the RDBMS provides considerable advantage. The best ISAM (or similar) systems provide transaction support - even XA transactions, sometimes.
Where you need to do complex joins and computations (aggregates, for example), the work done by the DBMS provides huge benefits. Where all you need is access to records, then ISAM could be beneficial.
Security tends to be harder to enforce with an ISAM-based system than with a DBMS. Also, you need to worry about integrity of the files in case of a crash. Most DBMS use a two-process architecture (DBMS client in a separate process from the DBMS server), which provides resilience in the face of the client crashing (or the client PC being turned off). You also have to worry about backup and restore - a competent DBMS has systems in place for providing a coherent backup of a database while the database is in use; it is not clear that ISAM systems would provide that level of integrity.
Overall, given a suitable ISAM mechanism, there would at least sometimes, maybe often, advantages to using an ISAM mechanism in an ORM system instead of a full RDBMS.
I implemented an ORM-to-isam library back in the 1990s that enjoyed some (very) modest success as shareware. I largely agree with what you say about the virtues of ISAMs and I think it better to use an ISAM when building an ORM layer or product if you are looking only for flexibility and speed.
However, the risk that you take is that you'll lose out on the benefits of the wide range of SQL-related products now on the market. In particular, reporting tools have evolved to be ever more tightly integrated with the most popular SQL packages. While ISAM product vendors in the 1990s provided ODBC drivers to integrate with products like Crystal Reports, it seemed, even then, that the market was trending away from ISAM and that I would be risking obsolescence if I continued using that technology. Thus, I switched to SQL.
One caveat: it has been nearly a decade since I was playing in the ISAM sandbox so I cannot purport to be up on the latest ISAM tools and their solutions to this problem. However, unless I was convinced that I was not going to be trapped without reporting tools support, I would not adopt an ISAM-based ORM regardless of its virtues. And that doesn't even cover the other tools available for SQL-based development!
I did my share of dBase, Clipper and FoxPro. However I believe the relational model provided by SQL is infinitely more powerful and useful, and products like Oracle and SQL Server deserve their success in the marketplace.
I'm always surprised why people make such a big deal of creating a mapping layer for the ~80-90% of the cases and writing 10-20% of custom SQL to deal with complex queries (mostly reports) and batch data movement. I must be doing something really good or something really silly by adopting the DAL/DAO model, given the level of hatred against hibernate, active record, etc. - vide Vietnam discussion from earlier.
Multivalue database anyone? (aka Pick) Think XML without the tags. They predate RDBMS by at least a decade, and still going strong if you know where to look.
If you know exactly what you want to do with your data and how you want to do that, pick ISAM. You will be happy because you will have structured your indexes to serve your exact needs. Know upfront that if your needs change, you will want to change your indexing. Data access will be blazing fast.
If you are not sure what uses the data will be put to, or you know your data needs will change a lot over time, pick SQL. You will have the flexibility of ad hoc queries, quick reporting turnaround, data mining, etc.
Both types of databases have matured over the years. Both can have robust servers with live backup, transactions, security, metadata, etc.
Old question, but interesting discussion. The concepts of ISAM are important, the additional features that we're provided in today's RDBMSs (as discussed i.e. backup, consistency, security, metadata) offer us signficant benefits.
With the NoSQL craze (yes I said it...craze) it doesn't mean that we can't model ISAM-like access inside the RDBMS. You'll be sure I'm gonna push off as much logic to the DB as I possibly can, but there are times like "traditional" data gridding/multi-dimensional data interpolation where I'll traverse all necessary records via my own logical index.