How necessary or convenient is it to write portable SQL? - sql

Time and again, I've seen people here and everywhere else advocating avoidance of nonportable extensions to the SQL language, this being the latest example. I recall only one article stating what I'm about to say, and I don't have that link anymore.
Have you actually benefited from writing portable SQL and dismissing your dialect's proprietary tools/syntax?
I've never seen a case of someone taking pains to build a complex application on mysql and then saying You know what would be just peachy? Let's switch to (PostGreSQL|Oracle|SQL Server)!
Common libraries in -say- PHP do abstract the intricacies of SQL, but at what cost? You end up unable to use efficient constructs and functions, for a presumed glimmer of portability you most likely will never use. This sounds like textbook YAGNI to me.
EDIT: Maybe the example I mentioned is too snarky, but I think the point remains: if you are planning a move from one DBMS to another, you are likely redesigning the app anyway, or you wouldn't be doing it at all.

Software vendors who deal with large enterprises may have no choice (indeed that's my world) - their customers may have policies of using only one database vendor's products. To miss out on major customers is commercially difficult.
When you work within an enterprise you may be able to benefit from the knowledge of the platform.
Generally speaking the DB layer should be well encapsulated, so even if you had to port to a new database the change should not be pervasive. I think it's reasonable to take a YAGNI approach to porting unless you have a specific requriement for immediate multi-vendor support. Make it work with your current target database, but structure the code carefully to enable future portability.

The problem with extensions is that you need to update them when you're updating the database system itself. Developers often think their code will last forever but most code will need to be rewritten within 5 to 10 years. Databases tend to survive longer than most applications since administrators are smart enough to not fix things that aren't broken so they often don't upgrade their systems with every new version.Still, it's a real pain when you upgrade your database to a newer version yet the extensions aren't compatible with that one and thus won't work. It makes the upgrade much more complex and demands more code to be rewritten.When you pick a database system, you're often stuck with that decision for years.When you pick a database and a few extensions, you're stuck with that decision for much, much longer!

The only case where I can see it necessary is when you are creating software the client will buy and use on their own systems. By far the majority of programming does not fall into this category. To refuse to use vendor specific code is to ensure that you have a porrly performing database as the vendor specific code is usually written to improve the performance of certain tasks over ANSII Standard SQL and it written to take advatage of the specific architecture of that database. I've worked with datbases for over 30 years and never yet have I seen a company change their backend database without a complete application rewrite as well. Avoiding vendor-specific code in this case means that you are harming your performance for no reason whatsoever most of the time.
I have also used a lot of different commercial products with database backends through the years. Without exception, every one of them was written to support multiple backends and, without exception, every one of them was a miserable, slow dog of a program to actually use on a daily basis.

In the vast majority of applications I would wager there is little to no benefit and even a negative effect of trying to write portable sql; however, in some cases there is a real use case. Let's assume you are building a Time Tracking Web Application. And you'd like to offer a self hosted solution.
In this case your clients will need to have a DB Server. You have some options here. You could force them into using a specific version which could limit your client base. If you can support multiple DBMS then you have a wider potential client that can use your web application.

If you're corporate, then you use the platform you are given
If you're a vendor, you have to plan for multiple platforms
Longevity for corporate:
You'll probably rewrite the client code before you migrate DBMS
The DBMS will probably outlive your client code (Java or c# against '80 mainframe)
Remember:
SQL within a platform is usually backward compatible, but client libraries are not. You are forced to migrate if the OS can not support an old library, or security environment, or driver architecture, or 16 bit library etc
So, assume you had an app on SQL Server 6.5. It still runs with a few tweaks on SQL Server 2008. I bet you're not using the sane client code...

There are always some benefits and some costs to using the "lowest common denominator" dialect of a language in order to safeguard portability. I think the dangers of lock-in to a particular DBMS are low, when compared to the similar dangers for programming languges, object and function libraries, report writers, and the like.
Here's what I would recommend as the primary way of safeguarding future portability. Make a logical model of the schema that includes tables, columns, constraints and domains. Make this as DBMS independent as you can, within the context of SQL databases. About the only thing that will be dialect dependent is the datatype and size for a few domains. Some older dialects lack domain support, but you should make your logical model in terms of domains anyway. The fact that two columns are drawn from the same domain, and don't just share a common datatype and size, is of crucial importance in logical modelling.
If you don't understand the distinction between logical modeling and physical modeling, learn it.
Make as much of the index structure portable as you can. While each DBMS has its own special index features, the relationship between indexes, tables, and columns is just about DBMS independent.
In terms of CRUD SQL processing within the application, use DBMS specific constructs whenever necessary, but try to keep them documented. As an example, I don't hesitate to use Oracle's "CONNECT BY" construct whenever I think it will do me some good. If your logical modeling has been DBMS independent, much of your CRUD SQL will also be DBMS independent even without much effort on your part.
When it comes time to move, expect some obstacles, but expect to overcome them in a systematic way.
(The word "you" in the above is to whom it may concern, and not to the OP in particular.)

Related

ORM and NH for business people

I have to prepare a case to convince managers to promote development using an ORM. I don't want to go into technical details in this case, the benefits have to be visible to business people.
I'm not quite happy with the arguments I've written down until now Are there any points I'm forgetting, both PRO and CONTRA?
The case I'm going to make will be in two points:
Convince managers to use an ORM
Convince managers to use NHibernate
The PRO's for ORM:
Solves the impedence mismatch between a rich ecosystem of connected objects with behaviour and tablular lists of scalar values
Higher productivity => reduced time writing tedious data access code lets you focus more on solving 'real' business issues
Higher maintainability => reduced number of LOC == system is easier to understand (hmm... maybe...)
Almost no performance hit when used right
The CONTRA's for ORM:
O/R mapping tools do not perform well with bulk processing of data. Stored procedures may have better performance, but are not portable
Heavy reliance on ORM software has been pointed to as a major factor in producing poorly designed databases
The PRO's for NH:
Very mature produce
Supports a lot of DB's => developers don't have to learn a new SQL dialect on every other project
High mind-share amongst .net community leaders
Many examples, articles, blog posts
It's open source
The CONTRA's for NH:
Not suited at all for batch processing
No code generation or code designer => some people think this makes developers more productive
Bad reputation due to lazy coding (== abuse of lazy loading)
It's not from Microsoft
It's open source => some companies just don't like that
Business people typically think in terms of cost and deliverables. Beyond that, most don't grasp or care about the technical reasons.
You already mentioned the higher productivity aspect.. try phrasing that as spending more time on business rules and less time on repetitive CRUD code.
I'd add, and highlight this:
Lower development friction makes meeting deadlines easier
Reduces cost of development in time spent working with the database
Reduces cost of maintenance
NHib is flexible; can handle object mapping automatically, and allow specific queries/stored procedures when needed
Also checkout Fluent NHibernate and Fluent Migrator.
I'd actually like to rebuke some of the negative points.
O/R mapping tools do not perform well with bulk processing of data. Stored procedures may have better performance, but are not portable
NH has many optimizations for bulk data processing (batching and caching are the first that come to mind). And you can always add stored procedures for some cases, it's not an "all-or-nothing" proposal.
Heavy reliance on ORM software has been pointed to as a major factor in producing poorly designed databases
I've actually seen the opposite: supposedly optimized DB-first designs that fall apart when the real use cases are implemented. In any case, it's a developer failure; you can do poorly with or without an ORM
Not suited at all for batch processing
Absolutely not true. NH has many features specifically designed for batch processing, like Stateless Sessions. Of course it's hard to beat the performance of a SP running in the DB server, but apart from that, it'll usually do just as well as adhoc ADO.NET code.
No code generation or code designer
False. There are several products that provide that. Check http://nhforge.org/wikis/general/commercial-product-ecosystem.aspx
Bad reputation due to lazy coding
If you do SELECT * FROM TABLE it will perform badly too. I fail to see how NH is to blame.
It's not from Microsoft
Neither are Oracle, the iPod or BMWs, yet people use them
It's open source
There is commercial support available. And, unlike what happens with MS products, the support is provided by people who know the internals and can fix them in a few hours instead of a few years.
When your manager decides which technology you have to use, your manager should be able to understand the technical reasons. Otherwise, he should trust the developers (they have the knowledge to make the decision) to decide. I can think of more useful things for a manager than making technical decisions he does not understand enough.
When your boss is like Dilberts pointy haired boss, I would go for the time and money saving argument (for every technical decision the manager wants to be involved in).
Take a look at this list of features of a real-world ORM. For any non-trivial project you will eventually end up using 80%+ of those features. So, either you can build those features yourself or you can use NHibernate. If you choose to build it yourself, be prepared to invest about $7.5M and 140 person years.
Or you can just use NHibernate, save the money and effort, and benefit from the huge knowledge base available on the net, books, community, etc, and even use (if necessary) one of the high-quality commercial support providers available.
You need to express it in terms of time and money - impact on development and maintenance time and impact on user time. ORM use is way too often the cause of a system that is strangling itself in poor performance (and which is now too far along the design path to change), so you need to address to the managers how you intend to avoid doing this. Frankly dev time and maintenance time are peanuts compared to wasted users time from bad implementation of an ORM. Yes it can be done right to avoid that but it usually is not. Often this is because the devs who use ORM don't understand databases and don't want to understand databases. ORM in the hands of a database expert, good thing, ORM in the hands of an application programmer who can't write basic SQL and who doesn't even understand joins, disaster waiting to happen.
No code generation or code designer
As Diego pointed out, there are third party tools. However, I would like to stress that NOT having to use a designer is a strength of NHibernate. Designers typically don't scale in the following ways:
What good is a design surface when an application has many, many entities? A visual designer just slows you down with noise at that point.
Designers typically have issues with merging. They work great when only one developer needs to edit the model at a time. The more devs you have on a project, the more potential trouble there is for merging the designer files.

Choosing ISAM rather than SQL

Many developers seem to be either intimidated or a bit overwhelmed when an application design requires both procedural code and a substantial database. In most cases, "database" means an RDBMS with an SQL interface.
Yet it seems to me that many of the techniques for addressing the "impedance mismatch" between the two paradigms would be much better suited to an ISAM (indexed-sequential access method) toolset, where you can (must) specify tables, indexes, row-naviagation, etc. overtly - exactly the behavior prescribed by the ActiveRecord model, for instance.
In early PC days, dBASE and its progeny were the dominant dbms platforms, and it was an enhanced ISAM. Foxpro continues this lineage quite successfully through to today. MySQL and Informix are two RDBMSs that were at least initially built on top of ISAM implementations, so this approach should be at least equally performant. I get the feeling that many developers who are unhappy with SQL are at least unconsciously yearning for the ISAM approach to be revived, and the database could be more easily viewed as a set of massively efficient linkable hyper-arrays. It seems to me that it could be a really good idea.
Have you ever tried, say, an ORM-to-ISAM implementation? How successfully? If not, do you think it might be worth a try? Are there any toolsets for this model explicitly?
Maybe Pig Latin is what you want? According to this article
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=693D79B5EFDC0452E1C9A87D1C495D4C?doi=10.1.1.124.5496&rep=rep1&type=pdf :
"Besides, many of the people who ana-
lyze this data are entrenched
procedural programmers, who find the
declarative, SQL style to be
unnatural. The success of the more
procedural map-reduce programming
model, and its associated scalable
implementations on commodity hard-
ware, is evidence of the above.
However, the map-reduce paradigm is
too low-level and rigid, and leads to
a great deal of custom user code that
is hard to maintain, and reuse. We
describe a new language called Pig
Latin that we have designed to fit in a
sweet spot between the declarative
style of SQL, and the low-level,
procedural style of map-reduce."
There are certainly times and places where ISAM provides the services needed by the application with less cost and overhead than a full-blown SQL DBMS. One downside of an ISAM mechanism is that there isn't necessarily a system catalogue to describe the data; another is that generally there are few user-friendly tools to get at the data. These are both places where the RDBMS provides considerable advantage. The best ISAM (or similar) systems provide transaction support - even XA transactions, sometimes.
Where you need to do complex joins and computations (aggregates, for example), the work done by the DBMS provides huge benefits. Where all you need is access to records, then ISAM could be beneficial.
Security tends to be harder to enforce with an ISAM-based system than with a DBMS. Also, you need to worry about integrity of the files in case of a crash. Most DBMS use a two-process architecture (DBMS client in a separate process from the DBMS server), which provides resilience in the face of the client crashing (or the client PC being turned off). You also have to worry about backup and restore - a competent DBMS has systems in place for providing a coherent backup of a database while the database is in use; it is not clear that ISAM systems would provide that level of integrity.
Overall, given a suitable ISAM mechanism, there would at least sometimes, maybe often, advantages to using an ISAM mechanism in an ORM system instead of a full RDBMS.
I implemented an ORM-to-isam library back in the 1990s that enjoyed some (very) modest success as shareware. I largely agree with what you say about the virtues of ISAMs and I think it better to use an ISAM when building an ORM layer or product if you are looking only for flexibility and speed.
However, the risk that you take is that you'll lose out on the benefits of the wide range of SQL-related products now on the market. In particular, reporting tools have evolved to be ever more tightly integrated with the most popular SQL packages. While ISAM product vendors in the 1990s provided ODBC drivers to integrate with products like Crystal Reports, it seemed, even then, that the market was trending away from ISAM and that I would be risking obsolescence if I continued using that technology. Thus, I switched to SQL.
One caveat: it has been nearly a decade since I was playing in the ISAM sandbox so I cannot purport to be up on the latest ISAM tools and their solutions to this problem. However, unless I was convinced that I was not going to be trapped without reporting tools support, I would not adopt an ISAM-based ORM regardless of its virtues. And that doesn't even cover the other tools available for SQL-based development!
I did my share of dBase, Clipper and FoxPro. However I believe the relational model provided by SQL is infinitely more powerful and useful, and products like Oracle and SQL Server deserve their success in the marketplace.
I'm always surprised why people make such a big deal of creating a mapping layer for the ~80-90% of the cases and writing 10-20% of custom SQL to deal with complex queries (mostly reports) and batch data movement. I must be doing something really good or something really silly by adopting the DAL/DAO model, given the level of hatred against hibernate, active record, etc. - vide Vietnam discussion from earlier.
Multivalue database anyone? (aka Pick) Think XML without the tags. They predate RDBMS by at least a decade, and still going strong if you know where to look.
If you know exactly what you want to do with your data and how you want to do that, pick ISAM. You will be happy because you will have structured your indexes to serve your exact needs. Know upfront that if your needs change, you will want to change your indexing. Data access will be blazing fast.
If you are not sure what uses the data will be put to, or you know your data needs will change a lot over time, pick SQL. You will have the flexibility of ad hoc queries, quick reporting turnaround, data mining, etc.
Both types of databases have matured over the years. Both can have robust servers with live backup, transactions, security, metadata, etc.
Old question, but interesting discussion. The concepts of ISAM are important, the additional features that we're provided in today's RDBMSs (as discussed i.e. backup, consistency, security, metadata) offer us signficant benefits.
With the NoSQL craze (yes I said it...craze) it doesn't mean that we can't model ISAM-like access inside the RDBMS. You'll be sure I'm gonna push off as much logic to the DB as I possibly can, but there are times like "traditional" data gridding/multi-dimensional data interpolation where I'll traverse all necessary records via my own logical index.

Why does no database fully support ANSI or ISO SQL standards?

If I were designing a oil refinery, I wouldn't expect that materials from different vendors would not comply with published standards in subtle yet important ways. Pipework, valves and other components from one supplier would come with flanges and wall thicknesses to ANSI standards, as would the same parts from any other supplier. Interoperability and system safety is therefore assured.
Why then are the common databases so choosy about which parts of the standards they adhere to, and why have no 100% standards-compliant systems come to the fore? Are the standards 'broken', lacking in scope or too difficult to design for?
Taking this to conclusion; what is the point of ANSI (or ISO) defining standards for SQL?
Edit: List of implementation differences between common databases
In the software industry you have some standards that are really standards, i.e., products that don't comply with them just don't work. File specifications fall into that category. But then you also have "standards" that are more like guidelines: they may defined as standards with point-by-point definitions, but routinely implemented only partially or with significant differences. Web development is full of such "standards", like HTML, CSS and "ECMAScript" where different vendors (i.e. web browsers) implement the standards differently.
The variation causes headaches, but the standardization still provides benefits. Imagine if there were no HTML standard at all and each browser used its own markup language. Likewise, imagine if there were no SQL standard and each database vendor used its own completely proprietary querying language. There would be much more vendor lock-in, and developers would have a much harder time working with more than one product.
So, no, ANSI SQL doesn't serve the same purpose as ANSI standards do in other industries. But it does serve a useful purpose nonetheless.
Probably because standards conformance is of a low priority to database system purchasers. They are more interested in:
compatibility with what they've already got
performance
price
OS support
to name but a few factors.
The same is true of programming languages - very few (if any) compilers support every single feature of the current ANSI C and C++ standards.
As to why bother with standard, well most vendors do eventually bring standard support on board. For example, most vendors support more or less all of SQL89. This allows the vendor to tick a (relatively unimportant) check-box on their spec sheet and also allow people like me who are interested in writing portable code to do so, albeit having to forgo lots of bells and whistles.
See the article "IS SQL A REAL STANDARD ANYMORE?" for a discussion about the current (2005) issues of the SQL standard.
Indeed, the ANSI SQL standard is not often followed. Just read SO: most SQL threads never refer to the standard while, for instance, discussions on network protocols often include the actual quote, chapter and verse of the relevant RFC.
I always suspected that one of the reasons is the fact that the SQL standard is not freely distributable. Simply getting it is not trivial. Various unofficial copies float around.)
Another reason is that it is a very complicated text and poorly organized. It uses a strange vocabulary (such as "authID" instead "user"). You need books just to understand the standard ("A guide to the SQL standard", C.J. Date, Hugh Darwen -
Addison-Wesley).
Mimer SQL has great standards support, yet it is pretty unknown. It is in production in several large sites, mostly in Sweden. But I think a lot of sites are migrating to others.
Detailed support statments:
SQL-99
SQL-2003
PSM
Database triggers and functions according to SQL:1999
Binary and Character Large OBjects (BLOB/CLOB/NCLOB) support according to SQL:1999
Multi-database transactions (two-phase commit) conforming to Open Group's XA-standard
Support for Java ME CDC Foundation Profile and Java ME CLDC/MIDP
I don't know the history of ANSI SQL specifically. But it seems that many times in software development, standards are written after the major players have already implemented their own proprietary versions of things. Once a company is invested in its own way of doing things, it's really hard to justify changing or removing features people have come to rely on just to adhere to a standard. Web standards are a primary example of this phenomenon.
A few years ago, one of the worst industries as far as pipes and connectors being mutually compatible was firefighting equipment. There were literally dozens of mutually incompatible hose to pump connections. When mutual aid became commonplace among fire fighters, they had to bring along dozens of adaptors to be able to interoperate their equipment.
On September 11, both police and firefighters had private wireless networks for intercommunicating among their people. But, guess what? The two systems were not mutually compatible. So there were needless delays in communicating information from one kind of public safety servant to another.
If you go back far enough, you can find a time in New York City where about half the electric grid was DC, favored by Edison, and the other half was AC, favored by Westinghouse.
Standards sometimes come about by themselves, and are called de facto standards. More commonly, standards have to be set forth by a body specifically empowered to make it happen. As to the SQL standards, some of the largest vendors pre date the standard. In order to comply with the standard, they would have to put in engineering expense that doesn't benefit their existing client base. Worse yet, they might end up being incompatible with their own prior product.
Full compliance with the SQL standard might yet happen, but it's unlikely. Even if it does, there's a delay time between the evolution of a new SQL standard and compliance with it.
IMHO, the DB vendors push forward the ANSI SQL standards to include new features & constructs within their field much more than ANSI telling the DB vendors the "one true way".
The DB market is driven by features, scalability and cost. It is not a commercial priority to forego and delay a technical advantage (i.e. partitioning, pivot, UPSERT, replication) by waiting for ANSI to ratify the syntax. By the time that has been done, there is already a significant installation of the proprietary syntax.
That being said, most DB vendors have improved their core "ANSI SQL" support greatly in the last few years. (SQL Server with the SELECT FROM INFORMATION_SCHEMA and Oracle's ANSI joins actually working as well as native joins under the CBO)
According to the HSQLDB manual, it is the most standards compliant RDBMS.
Almost all syntactic features of SQL-92 up to Advanced Level are supported
SQL:2008 core and many optional features of this standard
The real reason: most "developers" are client-centric coders, and therefore neither understand nor care about Dr. Codd's 12 rules. This is also why MySql, which isn't a relational database in any significant manner, is frequently seen in webKiddie development. Such developers only want rudimentary SELECT, UPDATE, DELETE parsing. They eschew constraints of any kind, preferring to "do it in the application". Reactionary 1960's applications are what you get.
Most of them are pretty compliant. But here's the bad news, IMO - standards breed mediocrity. Vendors want you to get locked into their extensions, and there's often good reason to do to nonstandard things. Realistically, how likely are you to dump Oracle for SQL Server, or vice-versa? Unless you build a product that your cusotmers may use against other databases, you as an enterprise are unlikely to swap out DB's. Too painful.
Companies usually tend to use one vendor to avoid having a jungle of different and possibly incompatible systems to support. It's also a lot cheaper to train your developers/system engineers in using one database vendor's tools than in 3 different sets of tools. Later on, this company may then grow large enough to buy some of its competitors. This would mean another entirely different set of tools that you'd have to manage, integrate, etc.
That's a lot of work.
Imagine that instead of that, you have a portability layer so that you really don't care what's beneath. That would be a lot better.

How to select an SQL database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We're living in a golden age of databases, with numerous high quality commercial and free databases. This is great, but the downside is there's not a simple obvious choice for someone who needs a database for his next project.
What are the constraints/criteria you use for selecting a database?
How well do the various databases you've used meet those constraints/criteria?
What special features do the databases have?
Which databases do you feel comfortable recommending to others?
etc...
I would think first on what the system requirements are for data access, data security, scalability, performance, disconnected scenarios, data transformation, data sizing.
On the other side, consider also the experience and background of developers, operators, platform administrators.
You should also think on what constraints you have regarding programming languages, operating systems, memory footprint, network bandwidth, hardware.
Last, but not least, you have to think about business issues like budget for licences, support, operation.
After all those considerations you should end up with just a couple of options and the selection should be easier.
In other words, select the technology that suits the best the constraints and needs of your organization and project.
I certainly think that you are right on saying that it is not an obvious choice given the wide number of alternatives, but this is the only way I think you can narrow them to the ones that are really feasible for your project.
My selection criteria (mainly programming centric):
Maintenance: How are updates/hotfixes installed?
Transaction control: How it is implemented
Are Stored Procedures supported?
Can you use exception handling in Stored Procedures?
Costs
As a benefit: Can you use recursion on Stored Procedures? (E.g. in SQL Server 2000 the recursion stops after 32 passes IIRC)
For most people in a corporate environment the choice comes down to "the one we have".
Since you seem to be fortunate enough to have a choice, I'll take a quick run through the questions and maybe pose a few more at the end.
The biggest criterion may be cost. Do you want/are you prepared to pay for your DBMS platform? If not, then Oracle, MS SQL Server, Sybase and others are probably out, although if you're not building a commercial app then there may be some wiggle room. Also, platform - can you run the software on your hardware?
Other dimensions for consideration might include expected number of concurrent connections, transactional vs mostly reads, size, availability and I guess lots of others.
"Special features" are, in the main, to be avoided - in my cynical world-view they're intended to lock you into a platform. So something like Oracle's PL/SQL is a feature that, while powerful (and likely to mean the need for extra CPU power at more licensing cost) is not portable. If you expect extremely high volumes then partitioning may be useful, I suppose.
I have worked with Oracle, MS SQL Server, MySQL, PostreSQL, SQLite and Sybase that I can think of. I'd happily recommend all but Sybase, about which I have some concerns these days (I could easily be wrong, but personally I think the money could be better spent elsewhere) but not all for the same applications.
Ideally, I like to have the warm feeling that it doesn't really matter what DB platform I'm using because I can port easily. With a good abstraction layer between data and business logic, I should be able to develop locally against, say, the excellent SQLite and implement painlessly on, for example, Postgres. With something like ActiveRecord from Rails coupled with a little awareness of things like differences in reserved words, this is almost completely cost-free.
Surely the most compelling factor is the expertise of you or your team...or the pool of resource you are likely to hire in the future. I would tend to go with the grain most of the time, using MySQL in a LAMP team and SQL Server in a MS team, since either of these products is capable of doing everything necessary even in a high-load environment.
The benefits of any other database are going to be marginal compared to the pain of learning how to use it well. The only exception to this, in my opinion, would be in a high-demand environment where:
a. the obvious choice has been tried and is failing
b. the benefits of scaling multiply the marginal benefit to such a degree that it will be worth the cost of using something unexpected.
I would assume the need to hire at least two and preferably three excellent DBAs with long term familiarity with the new database.
And first I would try to hire them for the technology that was failing, because it is more likely to be the way it's used than the technology itself that is causing the problem.
The existing answers are great. It's worth bearing in mind that Oracle now has an XE version of it's 10g database which is available for free and comes with Application Express, a great web based development environment.
It is limited, 4GB HD, 1 GB Ram and uses only one CPU. This is enough to run smaller system though and can be upgraded easily at a later date if necessary. Oracle can be one of the toughest to learn but is also one of the best to have on your CV :-)
I think SQLServer from Microsoft also has a 'starter' type database. Don't discount the commercial products - if you are going to bet your company on a database technology I would rather be using a product from Oracle or Microsoft personally. Thats not to say there is anything wrong with Open Source.
Spend a while evaluating them :-)
Linux, Web Hosted - MySQL (PostreSQL maybe)
Mainstream SME - MS SQL
Big Iron (banking etc) - Oracle
Thinking about anything other than those three is masturbation - any of the other databases becomes a discussion about niche products to solve particular problems that you probably haven't encountered yet. If you choose anything other than the three above you will -
Struggle to find people to work on the project or keep the database going
Struggle to motivate your decision without an academic discussion
Someone will curse you, your ancestors and your lineage a few years down the line - and replace your choice anyway.
Niche databases are not where architectural strides are made - it is technologies like middleware, messaging, cloud services etc where you can afford to (and should) go out on a limb to find good products.

Why do we need entity objects? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I really need to see some honest, thoughtful debate on the merits of the currently accepted enterprise application design paradigm.
I am not convinced that entity objects should exist.
By entity objects I mean the typical things we tend to build for our applications, like "Person", "Account", "Order", etc.
My current design philosophy is this:
All database access must be accomplished via stored procedures.
Whenever you need data, call a stored procedure and iterate over a SqlDataReader or the rows in a DataTable
(Note: I have also built enterprise applications with Java EE, java folks please substitute the equvalent for my .NET examples)
I am not anti-OO. I write lots of classes for different purposes, just not entities. I will admit that a large portion of the classes I write are static helper classes.
I am not building toys. I'm talking about large, high volume transactional applications deployed across multiple machines. Web applications, windows services, web services, b2b interaction, you name it.
I have used OR Mappers. I have written a few. I have used the Java EE stack, CSLA, and a few other equivalents. I have not only used them but actively developed and maintained these applications in production environments.
I have come to the battle-tested conclusion that entity objects are getting in our way, and our lives would be so much easier without them.
Consider this simple example: you get a support call about a certain page in your application that is not working correctly, maybe one of the fields is not being persisted like it should be. With my model, the developer assigned to find the problem opens exactly 3 files. An ASPX, an ASPX.CS and a SQL file with the stored procedure. The problem, which might be a missing parameter to the stored procedure call, takes minutes to solve. But with any entity model, you will invariably fire up the debugger, start stepping through code, and you may end up with 15-20 files open in Visual Studio. By the time you step down to the bottom of the stack, you forgot where you started. We can only keep so many things in our heads at one time. Software is incredibly complex without adding any unnecessary layers.
Development complexity and troubleshooting are just one side of my gripe.
Now let's talk about scalability.
Do developers realize that each and every time they write or modify any code that interacts with the database, they need to do a throrough analysis of the exact impact on the database? And not just the development copy, I mean a mimic of production, so you can see that the additional column you now require for your object just invalidated the current query plan and a report that was running in 1 second will now take 2 minutes, just because you added a single column to the select list? And it turns out that the index you now require is so big that the DBA is going to have to modify the physical layout of your files?
If you let people get too far away from the physical data store with an abstraction, they will create havoc with an application that needs to scale.
I am not a zealot. I can be convinced if I am wrong, and maybe I am, since there is such a strong push towards Linq to Sql, ADO.NET EF, Hibernate, Java EE, etc. Please think through your responses, if I am missing something I really want to know what it is, and why I should change my thinking.
[Edit]
It looks like this question is suddenly active again, so now that we have the new comment feature I have commented directly on several answers. Thanks for the replies, I think this is a healthy discussion.
I probably should have been more clear that I am talking about enterprise applications. I really can't comment on, say, a game that's running on someone's desktop, or a mobile app.
One thing I have to put up here at the top in response to several similar answers: orthogonality and separation of concerns often get cited as reasons to go entity/ORM. Stored procedures, to me, are the best example of separation of concerns that I can think of. If you disallow all other access to the database, other than via stored procedures, you could in theory redesign your entire data model and not break any code, so long as you maintained the inputs and outputs of the stored procedures. They are a perfect example of programming by contract (just so long as you avoid "select *" and document the result sets).
Ask someone who's been in the industry for a long time and has worked with long-lived applications: how many application and UI layers have come and gone while a database has lived on? How hard is it to tune and refactor a database when there are 4 or 5 different persistence layers generating SQL to get at the data? You can't change anything! ORMs or any code that generates SQL lock your database in stone.
I think it comes down to how complicated the "logic" of the application is, and where you have implemented it. If all your logic is in stored procedures, and all your application does is call those procedures and display the results, then developing entity objects is indeed a waste of time. But for an application where the objects have rich interactions with one another, and the database is just a persistence mechanism, there can be value to having those objects.
So, I'd say there is no one-size-fits-all answer. Developers do need to be aware that, sometimes, trying to be too OO can cause more problems than it solves.
Theory says that highly cohesive, loosely coupled implementations are the way forward.
So I suppose you are questioning that approach, namely separating concerns.
Should my aspx.cs file be interacting with the database, calling a sproc, and understanding IDataReader?
In a team environment, especially where you have less technical people dealing with the aspx portion of the application, I don't need these people being able to "touch" this stuff.
Separating my domain from my database protects me from structural changes in the database, surely a good thing? Sure database efficacy is absolutely important, so let someone who is most excellent at that stuff deal with that stuff, in one place, with as little impact on the rest of the system as possible.
Unless I am misunderstanding your approach, one structural change in the database could have a large impact area with the surface of your application. I see that this separation of concerns enables me and my team to minimise this. Also any new member of the team should understand this approach better.
Also, your approach seems to advocate the business logic of your application to reside in your database? This feels wrong to me, SQL is really good at querying data, and not, imho, expressing business logic.
Interesting thought though, although it feels one step away from SQL in the aspx, which from my bad old unstructured asp days, fills me with dread.
One reason - separating your domain model from your database model.
What I do is use Test Driven Development so I write my UI and Model layers first and the Data layer is mocked, so the UI and model is build around domain specific objects, then later I map these objects to what ever technology I'm using the the Data Layer. Its a bad idea to let the database structure determine the design of your application. Where possible write the app first and let that influence the structure of your database, not the other way around.
For me it boils down to I don't want my application to be concerned with how the data is stored. I'll probably get slapped for saying this...but your application is not your data, data is an artifact of the application. I want my application to be thinking in terms of Customers, Orders and Items, not a technology like DataSets, DataTables and DataRows...cuz who knows how long those will be around.
I agree that there is always a certain amount of coupling, but I prefer that coupling to reach upwards rather than downwards. I can tweak the limbs and leaves of a tree easier than I can alter it's trunk.
I tend to reserve sprocs for reporting as the queries do tend to get a little nastier than the applications general data access.
I also tend to think with proper unit testing early on that scenario's like that one column not being persisted is likely not to be a problem.
Eric,
You are dead on. For any really scalable / easily maintained / robust application the only real answer is to dispense with all the garbage and stick to the basics.
I've followed a similiar trajectory with my career and have come to the same conclusions. Of course, we're considered heretics and looked at funny. But my stuff works and works well.
Every line of code should be looked at with suspicion.
I would like to answer with an example similar to the one you proposed.
On my company I had to build a simple CRUD section for products, I build all my entities and a separate DAL. Later another developer had to change a related table and he even renamed several fields. The only file I had to change to update my form was the DAL for that table.
What (in my opinion) entities brings to a project is:
Ortogonality: Changes in one layer might not affect other layers (off course if you make a huge change on the database it would ripple through all the layers but most small changes won't).
Testability: You can test your logic with out touching your database. This increases performance on your tests (allowing you to run them more frequently).
Separation of concerns: In a big product you can assign the database to a DBA and he can optimize the hell out of it. Assign the Model to a business expert that has the knowledge necessary to design it. Assign individual forms to developers more experienced on webforms etc..
Finally I would like to add that most ORM mappers support stored procedures since that's what you are using.
Cheers.
I think you may be "biting off more than you can chew" on this topic. Ted Neward was not being flippant when he called it the "Vietnam of Computer Science".
One thing I can absolutely guarantee you is that it will change nobody's point of view on the matter, as has been proven so often on innumerable other blogs, forums, podcasts etc.
It's certainly ok to have open disucssion and debate about a controversial topic, it's just this one has been done so many times that both "sides" have agreed to disagree and just got on with writing software.
If you want to do some further reading on both sides, see articles on Ted's blog, Ayende Rahein, Jimmy Nilson, Scott Bellware, Alt.Net, Stephen Forte, Eric Evans etc.
#Dan, sorry, that's not the kind of thing I'm looking for. I know the theory. Your statement "is a very bad idea" is not backed up by a real example. We are trying to develop software in less time, with less people, with less mistakes, and we want the ability to easily make changes. Your multi-layer model, in my experience, is a negative in all of the above categories. Especially with regards to making the data model the last thing you do. The physical data model must be an important consideration from day 1.
I found your question really interesting.
Usually I need entities objects to encapsulate the business logic of an application. It would be really complicated and inadequate to push this logic into the data layer.
What would you do to avoid these entities objects? What solution do you have in mind?
Entity Objects can facilitate cacheing on the application layer. Good luck caching a datareader.
We should also talk about the notion what entities really are.
When I read through this discussion, I get the impression that most people here are looking at entities in the sense of an Anemic Domain Model.
A lot of people are considering the Anemic Domain Model as an antipattern!
There is value in rich domain models. That is what Domain Driven Design is all about.
I personally believe that OO is a way to conquer complexity. This means not only technical complexity (like data-access, ui-binding, security ...) but also complexity in the business domain!
If we can apply OO techniques to analyze, model, design and implement our business problems, this is a tremendous advantage for maintainability and extensibility of non-trivial applications!
There are differences between your entities and your tables. Entities should represent your model, tables just represent the data-aspect of your model!
It is true that data lives longer than apps, but consider this quote from David Laribee: Models are forever ... data is a happy side effect.
Some more links on this topic:
Why Setters and Getters are evil
Return of pure OO
POJO vs. NOJO
Super Models Part 2
TDD, Mocks and Design
Really interesting question. Honestly I can not prove why entities are good. But I can share my opinion why I like them. Code like
void exportOrder(Order order, String fileName){...};
is not concerned where order came from - from DB, from web request, from unit test, etc. It makes this method more explicitly declare what exactly it requires, instead of taking DataRow and documenting which columns it expects to have and which types they should be. Same applies if you implement it somehow as stored procedure - you still need to push record id to it, while it not necessary should be present in DB.
Implementation of this method would be done based on Order abstraction, not based on how exactly it is presented in DB. Most of such operations which I implemented really do not depend on how this data is stored. I do understand that some operations require coupling with DB structure for perfomance and scalability purposes, just in my experience there are not too much of them. In my experience very often it is enough to know that Person has .getFirstName() returning String, and .getAddress() returning Address, and address has .getZipCode(), etc - and do not care which tables are involed to store that data.
If you have to deal with such problems as you described, like when additional column breaks report perfomance, then for your tasks DB is a critical part, and you indeed should be as close as possible to it. While entities can provide some convenient abstractions they can hide some important details as well.
Scalability is interesting point here - most of websites which require enormous scalability (like facebook, livejournal, flickr) tend to use DB-ascetic approach, when DB is used as rare as possible and scalability issues are solved by caching, especially by RAM usage. http://highscalability.com/ has some interesting articles on it.
There are other good reasons for entity objects besides abstraction and loose coupling. One of the things I like most is the strong typing that you can't get with a DataReader or a DataTable. Another reason is that when done well, proper entity classes can make the code more maintanable by using first-class constructs for domain-specific terms that anyone looking at the code is likely to understand rather than a bunch of strings with field names in them used for indexing a DataRow. Stored procedures are really orthogonal to the use of an ORM since a lot of mapping frameworks give you the ability to map to sprocs.
I wouldn't consider sprocs + datareaders a substitute for a good ORM. With stored procedures, you're still constrained by, and tightly-coupled to, the procedure's type signature, which uses a different type system than the calling code. Stored procedures can be subject to modification to acommodate additional options and schema changes. An alternative to stored procedures in the case where the schema is subject to change is to use views--you can map objects to views and then re-map views to the underlying tables when you change them.
I can understand your aversion to ORMs if your experience mainly consists of Java EE and CSLA. You might want to have a look at LINQ to SQL, which is a very lightweight framework and is primarily a one-to-one mapping with the database tables but usually only needs minor extension for them to be full-blown business objects. LINQ to SQL can also map input and output objects to stored procedures' paramaters and results.
The ADO.NET Entity framework has the added advantage that your database tables can be viewed as entity classes inheriting from each other, or as columns from multiple tables aggregated into a single entity. If you need to change the schema, you can change the mapping from the conceptual model to the storage schema without changing the actual application code. And again, stored procedures can be used here.
I think that more IT projects in enterprises fail because of unmaintainability of the code or poor developer productivity (which can happen from, e.g., context switching between sproc-writing and app-writing) than scalability problems of an application.
I would also like to add to Dan's answer that separating both models could enable your application to be run on different database servers or even database models.
What if you need to scale your app by load balancing more than one web server? You could install the full app on all web servers, but a better solution is to have the web servers talk to an application server.
But if there aren't any entity objects, they won't have very much to talk about.
I'm not saying that you shouldn't write monoliths if its a simple, internal, short life application. But as soon as it gets moderately complex, or it should last a significant amount of time, you really need to think about a good design.
This saves time when it comes to maintaining it.
By splitting application logic from presentation logic and data access, and by passing DTOs between them, you decouple them. Allowing them to change independently.
You might find this post on comp.object interesting.
I'm not claiming to agree or disagree but it's interesting and (I think) relevant to this topic.
A question: How do you handle disconnected applications if all your business logic is trapped in the database?
In the type of Enterprise application I'm interested in, we have to deal with multiple sites, some of them must be able to function in a disconnected state.
If your business logic is encapsulated in a Domain layer that is simple to incorporate into various application types -say, as a dll- then I can build applications that are aware of the business rules and are able, when necessary, to apply them locally.
In keeping the Domain layer in stored procedures on the database you have to stick with a single type of application that needs a permanent line-of-sight to the database.
It's ok for a certain class of environments, but it certainly doesn't cover the whole spectrum of Enterprise applications.
#jdecuyper, one maxim I repeat to myself often is "if your business logic is not in your database, it is only a recommendation". I think Paul Nielson said that in one of his books. Application layers and UI come and go, but data usually lives for a very long time.
How do I avoid entity objects? Stored procedures mostly. I also freely admit that business logic tends to reach through all layers in an application whether you intend it to or not. A certain amount of coupling is inherent and unavoidable.
I have been thinking about this same thing a lot lately; I was a heavy user of CSLA for a while, and I love the purity of saying that "all of your business logic (or at least as much as is reasonably possible) is encapsulated in business entities".
I have seen the business entity model provide a lot of value in cases where the design of the database is different than the way you work with the data, which is the case in a lot of business software.
For example, the idea of a "customer" may consist of a main record in a Customer table, combined with all of the orders the customer has placed, as well as all the customer's employees and their contact information, and some of the properties of a customer and its children may be determined from lookup tables. It's really nice from a development standpoint to be able to work with the Customer as a single entity, since from a business perspective, the concept of Customer contains all of these things, and the relationships may or may not be enforced in the database.
While I appreciate the quote that "if your business rule is not in your database, it's only a suggestion", I also believe that you shouldn't design the database to enforce business rules, you should design it to be efficient, fast and normalized.
That said, as others have noted above, there is no "perfect design", the tool has to fit the job. But using business entities can really help with maintenance and productivity, since you know where to go to modify business logic, and objects can model real-world concepts in an intuitive way.
Eric,
No one is stopping you from choosing the framework/approach that you would wish. If you are going to go the "data driven/stored procedure-powered" path, then by all means, go for it! Especially if it really, really helps you deliver your applications on-spec and on-time.
The caveat being (a flipside to your question that is), ALL of your business rules should be on stored procedures, and your application is nothing more than a thin client.
That being said, same rules apply if you do your application in OOP : be consistent. Follow OOP's tenets, and that includes creating entity objects to represent your domain models.
The only real rule here is the word consistency. Nobody is stopping you from going DB-centric. No one is stopping you from doing old-school structured (aka, functional/procedural) programs. Hell, no one is stopping anybody from doing COBOL-style code. BUT an application has to be very, very consistent once going down this path, if it wishes to attain any degree of success.
I'm really not sure what you consider "Enterprise Applications". But I'm getting the impression you are defining it as an Internal Application where the RDBMS would be set in stone and the system wouldn't have to be interoperable with any other systems whether internal or external.
But what if you had a database with 100 tables which equate to 4 Stored Procedures for each table just for basic CRUD operations that's 400 stored procedures which need to be maintained and aren't strongly-typed so are susceptible to typos nor can be Unit Tested. What happens when you get a new CTO who is an Open Source Evangelist and wants to change the RDBMS from SQL Server to MySql?
A lot of software today whether Enterprise Applications or Products are using SOA and have some requirements for exposing Web Services, at least the software I am and have been involved with do.
Using your approach you would end up exposing a Serialized DataTable or DataRows. Now this may be deemed acceptable if the Client is guaranteed to be .NET and on an internal network. But when the Client is not known then you should be striving to Design an API which is intuitive and in most cases you would not want to be exposing the Full Database schema.
I certainly wouldn't want to explain to a Java developer what a DataTable is and how to use it. There's also the consideration of Bandwith and payload size and serialized DataTables, DataSets are very heavy.
There is no silver bullet with software design and it really depends on where the priorities lie, for me it's in Unit Testable code and loosely coupled components that can be easily consumed be any client.
just my 2 cents
I'd like to offer another angle to the problem of distance between OO and RDB: history.
Any software has a model of reality that is to some degree an abstraction of reality. No computer program can capture all the complexities of reality, and programs are written just to solve a set of problems from reality. Therefore any software model is a reduction of reality. Sometimes the software model forces reality to reduce itself. Like when you want the car rental company to reserve any car for you as long as it is blue and has alloys, but the operator can't comply because your request won't fit in the computer.
RDB comes from a very old tradition of putting information into tables, called accounting. Accounting was done on paper, then on punch cards, then in computers. But accounting is already a reduction of reality. Accounting has forced people to follow its system so long that it has become accepted reality. That's why it is relatively easy to make computer software for accounting, accounting has had its information model, long before the computer came along.
Given the importance of good accounting systems, and the acceptance you get from any business managers, these systems have become very advanced. The database foundations are now very solid and noone hesitates about keeping vital data in something so trustworthy.
I guess that OO must have come along when people have found that other aspects of reality are harder to model than accounting (which is already a model). OO has become a very successful idea, but persistance of OO data is relatively underdeveloped. RDB/Accounting has had easy wins, but OO is a much larger field (basically everything that isn't accounting).
So many of us have wanted to use OO but we still want safe storage of our data. What can be safer than to store our data the same way as the esteemed accounting system does? It is an enticing prospects, but we all run into the same pitfalls. Very few have taken the trouble to think of OO persistence compared to the massive efforts by the RDB industry, who has had the benefit of accounting's tradition and position.
Prevayler and db4o are some suggestions, I'm sure there are others I haven't heard of, but none have seemed to get half the press as, say, hibernation.
Storing your objects in good old files doesn't even seem to be taken seriously for multiuser applications, and especially web applications.
In my everyday struggle to close the chasm between OO and RDB I use OO as much as possible but try to keep inheritance to a minimum. I don't often use SPs. I'll use the advanced query stuff only in aspects that look like accounting.
I'll be happily supprised when the chasm is closed for good. I think the solution will come when Oracle launches something like "Oracle Object Instance Base". To really catch on, it will have to have a reassuring name.
Not a lot of time at the moment, but just off the top of my head...
The entity model lets you give a consistent interface to the database (and other possible systems) even beyond what a stored procedure interface can do. By using enterprise-wide business models you can make sure that all applications affect the data consistently which is a VERY important thing. Otherwise you end up with bad data, which is just plain evil.
If you only have one application then you don't really have an "enterprise" system, regardless of how big that application or your data are. In that case you can use an approach similar to what you talk about. Just be aware of the work that will be needed if you decide to grow your systems in the future.
Here are a few things that you should keep in mind (IMO) though:
Generated SQL code is bad
(exceptions to follow). Sorry, I
know that a lot of people think that
it's a huge time saver, but I've
never found a system that could
generate more efficient code than
what I could write and often the
code is just plain horrible. You
also often end up generating a ton
of SQL code that never gets used.
The exception here is very simple
patterns, like maybe lookup tables.
A lot of people get carried away on
it though.
Entities <> Tables (or even logical data model entities necessarily). A data model often has data rules that should be enforced as closely to the database as possible which can include rules around how table rows relate to each other or other similar rules that are too complex for declarative RI. These should be handled in stored procedures. If all of your stored procedures are simple CRUD procs, you can't do that. On top of that, the CRUD model usually creates performance issues because it doesn't minimize round trips across the network to the database. That's often the biggest bottleneck in an enterprise application.
Sometimes, your application and data layer are not that tightly coupled. For example, you may have a telephone billing application. You later create a separate application which monitors phone usage to a) better advertise to you b) optimise your phone plan.
These applications have different concerns and data requirements (even the data is coming out of the same database), they would drive different designs. Your code base can end up an absolute mess (in either application) and a nightmare to maintain if you let the database drive the code.
Applications that have domain logic separated from the data storage logic are adaptable to any kind of data source (database or otherwise) or UI (web or windows(or linux etc.)) application.
Your pretty much stuck in your database, which isn't bad if your with a company who is satisfied with the current database system your using. However, because databases evolve overtime there might be a new database system that is really neat and new that your company wants to use. What if they wanted to switch to a web services method of data access (like Service Orientated architecture sometime does). You might have to port your stored procedures all over the place.
Also the domain logic abstracts away the UI, which can be more important in large complex systems that have ever evolving UIs (especially when they are constantly searching for more customers).
Also, while I agree that there is no definitive answer to the question of stored procedures versus domain logic. I'm in the domain logic camp (and I think they are winning over time), because I believe that elaborate stored procedures are harder to maintain than elaborate domain logic. But that's a whole other debate
I think that you are just used to writing a specific kind of application, and solving a certain kind of problem. You seem to be attacking this from a "database first" perspective. There are lots of developers out there where data is persisted to a DB but performance is not a top priority. In lots of cases putting an abstraction over the persistence layer simplifies code greatly and the performance cost is a non-issue.
Whatever you are doing, it's not OOP. It's not wrong, it's just not OOP, and it doesn't make sense to apply your solutions to every othe problem out there.
Interesting question. A couple thoughts:
How would you unit test if all of your business logic was in your database?
Wouldn't changes to your database structure, specifically ones that affect several pages in your app, be a major hassle to change throughout the app?
Good Question!
One approach I rather like is to create an iterator/generator object that emits instances of objects that are relevant to a specific context. Usually this object wraps some underlying database access stuff, but I don't need to know that when using it.
For example,
An AnswerIterator object generates AnswerIterator.Answer objects. Under the hood it's iterating over a SQL Statement to fetch all the answers, and another SQL statement to fetch all related comments. But when using the iterator I just use the Answer object that has the minimum properties for this context. With a little bit of skeleton code this becomes almost trivial to do.
I've found that this works well when I have a huge dataset to work on, and when done right, it gives me small, transient objects that are relatively easy to test.
It's basically a thin veneer over the Database Access stuff, but it still gives me the flexibility of abstracting it when I need to.
The objects in my apps tend to relate one-to-one to the database, but I'm finding using Linq To Sql rather than sprocs makes it much easier writing complicated queries, especially being able to build them up using the deferred execution. e.g. from r in Images.User.Ratings where etc. This saves me trying to work out several join statements in sql, and having Skip & Take for paging also simplifies the code rather than having to embed the row_number & 'over' code.
Why stop at entity objects? If you don't see the value with entity objects in an enterprise level app, then just do your data access in a purely functional/procedural language and wire it up to a UI. Why not just cut out all the OO "fluff"?