Related
If there are any DBAs out there, I'm making a fairly large piece of software and one of the biggest issues presently is where to put the business logic. While Stored Procedures would be easier to fix on the fly, the processing requirements would probably slow the DB down tremendously. I also don't want to have all of the business logic handled by the application because I want it to be a "self-sustaining entity" that doesn't require the user-front end to operate.
My idea, is to create a service to run on a central server somewhere, and have the clients connect through that. The service would maintain all the business logic and serve as a front-end for all the database operations.
Ideas? Yes? No?
I'm willing to accept that I'm also missing some key concepts and need to read some literature.
i would suggest that you keep a keen eye on the difference between what you think of as business logic, and what are the referential integrity constraints.
Make sure all constraints that keep the data meaningfully related are in place at the database layer. i.e. if you need to cascade some deletes, or inserts - and when you need to validate some basic data values in order to have everything make sense... these should all be in the database.
Then decide if the Client, or the middle layer server, or the database is appropriate for any additional business logic.
What do you mean by "business logic"?
I've seen cases where aggregations and other set-based operations have been done in client code, as well as horrible RBAR operations in SQL that should be somewhere else.
SQL is one tool that has it's place: if you're working through large datasets, JOINs, aggregations etc then SQL is the place to do it. Anything else is slavish obedience to an SOA ideal.
My approach is to consider what the stored proc or SQL is doing: is it part of the middle tier to avoid set based operations in procedural code, or is it lower as pure data integrity/persistence?
If your business logic is 100% set based then you don't need a middle tier (edit: client code based) arguably, unless it's very thin.
Over the years, I've seen client applications come and go, but the database is still there.
So nowadays I use stored procedures for most of the business logic. Three big advantages:
Bug fix deployment takes an instant, with no downtime
Multi-user by default
Far less plumbing code (no data access layer)
Having all business logic on the server side is fine.
Not having it on the server side is fine, too.
In fact, it's up to you.
If a stored procedure tends to look not sql-ish, you can make a CLR stored procedure.
Here's a similar question.
I'd highly recommend a traditional n-layer approach, where you have at least UI layer, business layer (like a C# assembly or Java equivolent), and data access. See: http://en.wikipedia.org/wiki/Multitier_architecture.
I worked for a company where all the business logic was in the procs, and maintence costs are much higher than they had to be, it limited us to a specific version of sql server, it wasn't scalable, etc. In short, unless your application is a simple throw away kind of thing, I'd not put any business logic in the database.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I've heard a lot lately that SQL is a terrible language, and it seems that every framework under the sun comes pre-packaged with a database abstraction layer.
In my experience though, SQL is often the much easier, more versatile, and more programmer-friendly way to manage data input and output. Every abstraction layer I've used seems to be a markedly limited approach with no real benefit.
What makes SQL so terrible, and why are database abstraction layers valuable?
This is partly subjective. So this is my opinion:
SQL has a pseudo-natural-language style. The inventors believed that they can create a language just like English and that database queries will be very simple. A terrible mistake. SQL is very hard to understand except in trivial cases.
SQL is declarative. You can't tell the database how it should do stuff, just what you want as result. This would be perfect and very powerful - if you wouldn't have to care about performance. So you end up in writing SQL - reading execution plans - rephrasing SQL trying to influence the execution plan, and you wonder why you can't write the execution plan yourself.
Another problem of the declarative language is that some problems are easier to solve in a imperative manner. So you either write it in another language (you'll need standard SQL and probably a data access layer) or by using vendor specific language extensions, say by writing stored procedures and the like. Doing so you will probably find that you're using one of the worst languages you've ever seen - because it was never designed to be used as an imperative language.
SQL is very old. SQL has been standardized, but too late, many vendors already developed their language extensions. So SQL ended up in dozens of dialects. That's why applications are not portable and one reason to have a DB abstraction layer.
But it's true - there are no feasible alternatives. So we all will use SQL for the next few years.
Aside from everything that was said, a technology doesn't have to be bad to make an abstraction layer valuable.
If you're doing a very simple script or application, you can afford to mix SQL calls in your code wherever you like. However, if you're doing a complex system, isolating the database calls in separate module(s) is a good practice and so it is isolating your SQL code. It improves your code's readability, maintainability and testability. It allows you to quickly adapt your system to changes in the database model without breaking up all the high level stuff, etc.
SQL is great. Abstraction layers over it makes it even greater!
One point of abstraction layers is the fact that SQL implementations tend to be more or less incompatible with each other since the standard is slightly ambiguous, and also because most vendors have added their own (nonstandard) extras there. That is, SQL written for a MySQL DB might not work quite similarly with, say, an Oracle DB — even if it "should".
I agree, though, that SQL is way better than most of the abstraction layers out there. It's not SQL's fault that it's being used for things that it wasn't designed for.
SQL gets badmouthed from several sources:
Programmers who are not comfortable with anything but an imperative language.
Consultants who have to deal with many incompatible SQL-based products on a daily basis
Nonrelational database vendors trying to break the stranglehold of relational database vendors on the market
Relational database experts like Chris Date who view current implementations of SQL as insufficient
If you stick to one DBMS product, then I definitely agree that SQL DBs are more versatile and of higher quality than their competition, at least until you hit a scalability barrier intrinsic in the model. But are you really trying to write the next Twitter, or are you just trying to keep some accounting data organized and consistent?
Criticism of SQL is often a standin for criticisms of RDBMSes. What critics of RDBMSes seem not to understand is that they solve a huge class of computing problems quite well, and that they are here to make our lives easier, not harder.
If they were serious about criticizing SQL itself, they'd back efforts like Tutorial D and Dataphor.
It's not so terrible. It's an unfortunate trend in this industry to rubbish the previous reliable technology when a new "paradigm" comes out. At the end of the day, these frameworks are very most probably using SQL to communicate with the database so how can it be THAT bad? That said, having a "standard" abstraction layer means that a developer can focus on the application code and not the SQL code. Without such a standard layer you'd probably write a lightweight one each time you're developing a system, which is a waste of effort.
SQL is designed for management and query of SET based data. It is often used to do more and edge cases lead to frustration at times.
Actual USE of SQL can be SO impacted by the base database design that the SQL may not be the issue, but the design might - and when you toss in the legacy code associated with a bad design, changes are more impactive and costly to impliment (no one like to go back and "fix" stuff that is "working" and meeting objectives)
Carpenters can pound nails with hammers, saw lumber with saws and smooth boards with planes. It IS possible to "saw" using hammers and planes, but dang it is frustrating.
I wont say it's terrible. It's unsuitable for some tasks. For example: you can not write good procedural code with SQL. I was once forced to work with set manipulation with SQL. It took me a whole weekend to figure that out.
SQL was designed for relational algebra - that's where it should to be used.
I've heard a lot lately that SQL is a terrible language, and it seems that every framework under the sun comes pre-packaged with a database abstraction layer.
Note that these layers just convert their own stuff into SQL. For most database vendors SQL is the only way to communicate with the engine.
In my experience though, SQL is often the much easier, more versatile, and more programmer-friendly way to manage data input and output. Every abstraction layer I've used seems to be a markedly limited approach with no real benefit.
… reason for which I just described above.
The database layers don't add anything, they just limit you. They make the queries disputably more simple but never more efficient.
By definition, there is nothing in the database layers that is not in SQL.
What makes SQL so terrible, and why are database abstraction layers valuable?
SQL is a nice language, however, it takes some brain twist to work with it.
In theory, SQL is declarative, that is you declare what you want to get and the engine provides it in the fastest way possible.
In practice, there are many ways to formulate a correct query (that is the query that return correct results).
The optimizers are able to build a Lego castle out of some predefined algorithms (yes, they are multiple), but they just cannot make new algorithms. It still takes an SQL developer to assist them.
However, some people expect the optimizer to produce "the best plan possible", not "the best plan available for this query with given implementation of the SQL engine".
And as we all know, when the computer program does not meet people's expectations, it's the program that gets blamed, not the expectations.
In most cases, however, reformulating a query can produce a best plan possible indeed. There are tasks when it's impossible, however, with the new and growing improvements to SQL these cases get fewer and fewer in number.
It would be nice, though, if the vendors provided some low-level access to the functions like "get the index range", "get a row by the rowid" etc., like C compilers let you to embed the assembly right into the language.
I recenty wrote an article on this in my blog:
Double-thinking in SQL
I'm a huge ORM advocate and I still believe that SQL is very useful, although it's certainly possible to do terrible things with it (just like anything else). .
I look at SQL as a super-efficient language that does not have code re-use or maintainability/refactoring as priorities.
So lightning fast processing is the priority. And that's acceptable. You just have to be aware of the trade-offs, which to me are considerable.
From an aesthetic point of view, as a language I feel that it is lacking some things since it doesn't have OO concepts and so on -- it feels like very old school procedural code to me. But it's far and away the fastest way to do certain things, and that's a powerful niche!
SQL is excellent for certain kinds of tasks, especially manipulating and retrieving sets of data.
However, SQL is missing (or only partially implements) several important tools for managing change and complexity:
Encapsulation: SQL's encapsulation mechanisms are coarse. When you write SQL code, you have to know everything about the implementation of your data. This limits the amount of abstraction you can achieve.
Polymorphism: if you want to perform the same operation on different tables, you've got to write the code twice. (One can mitigate this with imaginative use of views.)
Visibility control: there's no standard SQL mechanism for hiding pieces of the code from one another or grouping them into logical units, so every table, procedure, etc. is
accessible from every other one, even when it's undesirable.
Modularity and Versioning
Finally, manually coding CRUD operations in SQL (and writing the code to hook it up to the rest of one's application) is repetitive and error-prone.
A modern abstraction layer provides all of those features, and allows us to use SQL where it's most effective while hiding the disruptive, repetitive implementation details. It provides tools to help overcome the object-relational impedance mismatch that complicates data access in object-oriented software development.
I would say that a database abstraction layer included with a framework is a good thing because it solves two very important problems:
It keeps the code distinct. By putting the SQL into another layer, which is generally very thin and should only be doing the basics of querying and handoff of results (in a standardized way), you keep your application free from the clutter of SQL. It's the same reason web developers (should) put CSS and Javascript in separate files. If you can avoid it, do not mix your languages.
Many programmers are just plain bad at using SQL. For whatever reason, a large number of developers (especially web developers) seem to be very, very bad at using SQL, or RDBMSes in general. They treat the database (and SQL by extension) as the grubby little middleman they have to go through to get to data. This leads to extremely poorly thought out databases with no indexes, tables stacked on top of tables in dubious manners, and very poorly written queries. Or worse, they try to be too general (Expert System, anyone?) and cannot reasonably relate data in any meaningful way.
Unfortunately, sometimes the way that someone tries to solve a problem and tools they use, whether due to ignorance, stubbornness, or some other trait, are in direct opposition with one another, and good luck trying to convince them of this. As such, in addition to just being a good practice, I consider a database abstraction layer to be a sort of safety net, as it not only keeps the SQL out of the poor developer's eyes, but it makes their code significantly easier to refactor, since all the queries are in one place.
SQL is based on Set Theory, while most high level languages are object oriented these days. Object programmers typically like to think in objects, and have to make a mental shift to use Set based tools to store their objects. Generally, it is much more natural (for the OO programmer) to just cut code in the language of their choice and do something like object.save or object.delete in application code instead of having to write sql queries and call the database to achieve the same result.
Of course, sometimes for complex things, SQL is easier to use and more efficient, so it is good to have a handle on both types of technology.
IMO, the problem that I see that people have with SQL has nothing to do with relational design nor the SQL language itself. It has to do with the discipline of modeling the data layer which in many ways is fundamentally different than modeling a business layer or interface. Mistakes in modeling at the presentation layer are generally much easier to correct than at the data layer where you have multiple applications using the database. These problems are the same as those encountered in modeling a service layer in SOA designs where you have to account for current consumers of your service and the input and output contracts.
SQL was designed to interact with relational database models. There are other data models that have existed for some time, but the discipline about designing the data layer properly exists regardless of the theoretical model used and thus, the difficulties that developers typically have with SQL are usually related to attempts to impose a non-relational data model onto a relational database product.
For one thing, they make it trivial to use parameterized queries, protecting you from SQL injection attacks. Using raw SQL, from this perspective, is riskier, that is, easier to get wrong from a security perspective. They also often present an object-oriented perspective on your database, relieving you of having to do this translation.
Heard a lot recently? I hope you're not confusing this with the NoSql movement. As far as i'm aware that is mainly a bunch of people who use NoSql for high scalability web apps and appear to have forgotten that SQL is an effective tool in a non "high scalability web app" scenario.
The abstraction layer business is just about sorting out the difference between Object Oriented code and Table - Set based code such as SQL likes to talk. Usually this results in writing lots of boiler plate and dull transition code between the two. ORM automates this and thus saves time for business objecty people.
For experienced SQL programmer the bad sides are
Verbosity
As many have said here, SQL is declarative, which means optimizing is not direct. It's like rallying compared to circuit racing.
Frameworks that try to address all possible dialects and don't support shortcuts of any of them
No easy version control.
For others, the reasons are that
some programmers are bad at SQL. Probably because SQL operates with sets, while programming languages work in object or functional paradigm. Thinking in sets (union, product, intersect) is a matter of habbit that some people don't have.
some operations aren't self-explanatory: i.e. at first it's not clear that where and having filter different sets.
there are too many dialects
The primary goal of SQL frameworks is to reduce your typing. They somehow do, but too often only for very simple queries. If you try doing something complex, you have to use strings and type a lot. Frameworks that try to handle everything possible, like SQL Alchemy, become too huge, like another programming language.
[update on 26.06.10] Recently I worked with Django ORM module. This is the only worthy SQL framework I've seen. And this one makes working with stuff a lot. Complex aggregates are a bit harder though.
SQL is not a terrible language, it just doesn't play too well with others sometimes.
If for example if you have a system that wants to represent all entities as objects in some OO language or another, then combining this with SQL without any kind of abstraction layer can become rather cumbersome. There's no easy way to map a complex SQL query onto the OO-world. To ease the tension between those worlds additional layers of abstraction are inserted (an OR-Mapper for example).
SQL is a really good language for data manipulation. From a developer perspective, what I don't like with it is that changing the database don't break your code at compile time... So I use abstraction which add this feature at the price of performance and maybe expressiveness of the SQL language, because in most application you don't need all the stuff SQL has.
The other reason why SQL is hated, is because of relational databases.
The CAP Theorem becomes popular:
What goals might you want from a
shared-data system?
Strong Consistency: all clients see the same view, even in presence of
updates
High Availability: all clients can find some replica of the data, even in
the presence of failures
Partition-tolerance: the system properties hold even when the system
is partitioned
The theorem states that you can always
have only two of the three CAP
properties at the same time
Relational database address Strong Consistency and Partition-Tolerance.
So more and more people realize that relational database is not the silver bullet, and more and more people begin to reject it in favor of high availability, because high availability makes horizontal scaling more easy. Horizontal scaling gain popularity because we have reached the limit of Moore law, so the best way to scale is to add more machine.
If relational database is rejected, SQL is rejected too.
Quick, write me SQL to paginate a dataset that works in MySQL, Oracle, MSSQL, PostgreSQL, and DB2.
Oh, right, standard SQL doesn't define any operators to limit the number of results coming back and which row to start at.
• Every vendor extends the SQL syntax to suit their needs. So unless you're doing fairly simple things, your SQL code is not portable.
• The syntax of SQL is not orthogonal; e.g., the select, insert, update,anddelete statements all have completely different syntactical structure.
I agree with your points, but to answer your question, one thing that makes SQL so "terrible" is the lack of complete standardization of T-SQL between database vendors (Sql Server, Oracle etc.), which makes SQL code unlikely to be completely portable. Database abstraction layers solve this problem, albeit with a performance cost (sometimes a very severe one).
Living with pure SQL can really be a maintenance hell. For me the greatest advantage of ORMs is the ability to safely refactor code without tedious "DB refactoring" procedures. There are good unit testing frameworks and refactoring tools for OO languages, but I yet have to see Resharper's counterpart for SQL, for example.
Still all DALs have SQL behind the scenes, and still you need to know it to understand what's happening to your database, but daily working with good abstraction layer becomes easier.
If you haven't used SQL too much, I think the major problem is the lack of good developer tools.
If you have lots of experience with SQL, you will have, at one point or another, been frustrated by the lack of control over the execution plan. This is an inherent problem in the way SQL was specified to the vendors. I think SQL needs to become a more robust language to truly harness the underlying technology (which is very powerful).
SQL has many flaws, as some other posters here have pointed out. Still, I much prefer to use SQL over many of the tools that people offer as alternatives, because the "simplifications" are often more complicated than the thing they were supposed to simplify.
My theory is that SQL was invented by a bunch of ivory-tower blue-skiers. The whole non-procedural structure. Sounds great: tell me what you want rather than how you want to do it. But in practice, it's often easier to just give the steps. Often this seems like trying to give car maintenance instructions by describing how the car should perform when you're done. Yes, you could say, "I want the car to once again get 30 miles per gallon, and to run with this humming sound like this ... hmmmm ... and, etc" But wouldn't it be easier for everyone to just say, "Replace the spark plugs" ? And even when you do figure out how to express a complex query in non-procedural terms, the database engine often comes up with a very inefficient execution plan to get there. I think SQL would be much improved by the addition of standardized ways to tell it which table to read first and what index to use.
And the handling of nulls drive me crazy! Yes, theoretically it must have sounded great when someone said, "Hey, if null means unknown, then adding an unknown value to a known value should give an unknown value. After all, by definition, we have no idea what the unknown value is." Theoretically, absolutely true. In practice, if we have 10,000 customers and we know exactly how much money 9,999 owe us but there's some question about the amount owed by the last one, and management says, "What are our total accounts receivable?", yes, the mathematically correct answer is "I don't know". But the practical answer is "we calculate $4,327,287.42 but one account is in question so that number isn't exact". I'm sure management would much rather get a close if not certain number than a blank stare. But SQL insists on this mathemcatically pristine approach, so every operation you do, you have to add extra code to check for nulls and handle them special.
All that said, I'd still rather use SQL than some layer built on top of SQL, that just creates another whole set of things I need to learn, and then I have to know that ultimately this will be translated to SQL, and sometimes I can just trust it to do the translation correctly and efficiently, but when things get complex I can't, so now I have to know the extra layer, I still have to know SQL, and I have to know how it's going to translate to I can trick the layer into tricking SQL into doing the right thing. Arggh.
There's no love for SQL because SQL is bad in syntax, semantics and current usage. I'll explain:
it's syntax is a cobol shrapnel, all the cobol criticism applies here (to a lesser degree, to be fair). Trying to be natural language like without actually attempting to interpret natural language creates arbirtrary syntax (is it DROP TABLE or DROP , UPDATE TABLE , UPDATE or UPDATE IN , DELETE or DELETE FROM ...) and syntactical monstrosities like SELECT (how many pages does it fill?)
semantics is also deeply flawed, Date explains it in great detail, but it will suffice to note that a three valued boolean logic doesn't really fit a relational algebra where a row can only be or not be part of a table
having a programming language as the main (and often only) interface to databases proved to be a really bad choice and it created a new category of security flaws
I'd agree with most of the posts here that the debate over the utility of SQL is mostly subjective, but I think it's more subjective in the nature of your business needs.
Declarative languages, as Stefan Steinegger has pointed out, are good for specifying what you want, not how you want to do it. This means that your various implementations of SQL are decent from a high-level perspective : that is, if all you want is to get some data and nothing else matters, you can satisfy yourself with writing relatively simple queries, and choosing the implementation of SQL that is right for you.
If you work on a much "lower" level, and you need to optimize all of that yourself, it's far from ideal. Using a further layer of abstraction can help, but if what you're really trying to do is specify the methods for optimizing queries and so forth, it's a little counter intuitive to add a middleman when trying to optimize.
The biggest problem I have with SQL is like other "standardized" languages, there are very few real standards. I'd almost prefer having to learn a whole new language between Sybase and MySQL so that I don't get the two conventions confused.
While SQL does get the job done it certainly has issues...
it tries to simultaneously be the high level and the low level abstraction, and that's ... odd. Perhaps it should have been two or more standards at different levels.
it is a huge failure as a standard. Lots of things go wrong when a standard either stirs in everything, asks too much of implementations, asks too little, or for some reason does not accomplish the partially social goal of motivating vendors and implementors to produce strictly conforming interoperable complete implementations. You certainly cannot say SQL has done any of that. Look at some other standards and note that success or failure of the standard is clearly a factor of the useful cooperation attained:
RS-232 (Bad, not nearly enough specified, even which pin transmits and which pin receives is optional, sheesh. You can comply but still achieve nothing. Chance of successful interop: really low until the IBM PC made a de-facto useful standard.)
IEEE 754-1985 Floating Point (Bad, overreach: not a single supercomputer or scientific workstation or RISC microprocessor ever adopted it, although eventually after 20 years we were able to implement it nicely in HW. At least the world eventually grew into it.)
C89, C99, PCI, USB, Java (Good, whether standard or spec, they succeeded in motivating strict compliance from almost everyone, and that compliance resulted in successful interoperation.)
it failed to be selected for arguably the most important database in the world. While this is more of a datapoint than a reason, the fact that Google Bigtable is not SQL and not relational is kind of an anti-achievement for SQL.
I don't dislike SQL, but I also don't want to have to write it as part of what I am developing. The DAL is not about speed to market - actually, I have never thought that there would be a DAL implementation that would be faster than direct queries from the code. But the goal of the DAL is to abstract. Abstraction comes at a cost, and here it is that it will take longer to implement.
The benefits are huge, though. Writing native tests around the code, using expressive classes, strongly typed datasets, etc. We use a "DAL" of sorts, which is a pure DDD implementation using Generics in C#. So we have generic repositories, unit of work implementations (code based transactions), and logical separation. We can do things like mock out our datasets with little effort and actually develop ahead of database implementations. There was an upfront cost in building such a framework, but it is very nice that business logic is the star of the show again. We consume data as a resource now, and deal with it in the language we are natively using in the code. An added benefit of this approach is the clear separation it provides. I no longer see a database query in a web page, for example. Yes, that page needs data. Yes, the database is involved. But now, no matter where I am pulling data from, there is one (and only one) place to go into the code and find it. Maybe not a big deal on smaller projects, but when you have hundreds of pages in a site or dozens of windows in a desktop application, you truly can appreciate it.
As a developer, I was hired to implement the requirements of the business using my logical and analytical skills - and our framework implementation allows for me to be more productive now. As a manager, I would rather have my developers using their logical and analytical skills to solve problems than to write SQL. The fact that we can build an entire application that uses the database without having the database until closer to the end of the development cycle is a beautiful thing. It isn't meant as a knock against database professionals. Sometimes a database implementation is more complex than the solution. SQL (and in our case, Views and Stored Procs, specifically) are an abstraction point where code can consume data as a service. In shops where there is a definite separation between the data and development teams, this helps to eliminate sitting in a holding pattern waiting for database implementation and changes. Developers can focus on the problem domain without hovering over a DBA and the DBA can focus on the correct implementation without a developer needing it right now.
Many posts here seem to argue that SQL is bad because it doesn't have "code optimization" features, and that you have no control over execution plans.
What SQL engines are good at is to come up with an execution plan for a written instruction, geared towards the data, the actual contents. If you care to take a look beyond the programming side of things, you will see that there is more to data than bytes being passed between application tiers.
When should I be using stored procedures instead of just writing the logic directly in my application? I'd like to reap the benefits of stored procedures, but I'd also like to not have my application logic spread out over the database and the application.
Are there any rules of thumb that you can think of in reference to this?
Wow... I'm going to swim directly against the current here and say, "almost always". There are a laundry list of reasons - some/many of which I'm sure others would argue. But I've developed apps both with and without the use of stored procs as a data access layer, and it has been my experience that well written stored procedures make it so much easier to write your application. Then there's the well-documented performance and security benefits.
This depends entirely on your environment. The answer to the question really isn't a coding problem, or even an analysis issue, but a business decision.
If your database supports just one application, and is reasonably tightly integrated with it, then it's better, for reasons of flexibility, to place your logic inside your application program. Under these circumstances handling the database simply as a plain data repository using common functionality looses you little and gains flexibility - with vendors, implementation, deployment and much else - and many of the purist arguments that the 'databases are for data' crowd make are demonstratively true.
On the other hand if your are handling a corporate database, which can generally be identified by having multiple access paths into it, then it is highly advisable to screw down the security as far as you can. At the very least all appropriate constraints should enabled, and if possible access to the data should be through views and procedures only. Whining programmers should be ignored in these cases as...
With a corporate database the asset is valuable and invalid data or actions can have business-threatening consequences. Your primary concern is safeguarding the business, not how convenient access is for your coders.
Such databases are by definition accessed by more than one application. You need to use the abstraction that stored procedures offer so the database can be changed when application A is upgraded and you don't have the resource to upgrade application B.
Similarly the encapsulation of business logic in SPs rather than in application code allows changes to such logic to be implemented across the business more easily and reliably than if such logic is embedded in application code. For example if a tax calculation changes it's less work, and more robust, if the calculation has to be changed in one SP than multiple applications. The rule of thumb here is that the business rule should be implemented at the closest point to the data where it is unique - so if you have a specialist application then the logic for that app can be implemented in that app, but logic more widely applicable to the business should be implemented in SPs.
Coders who dive into religious wars over the use or not of SPs generally have worked in only one environment or the other so they extrapolate their limited experience into a cast-iron position - which indeed will be perfectly defensible and correct in the context from which they come but misses the big picture. As always, you should make you decision on the needs of the business/customers/users and not on the which type of coding methodology you prefer.
I tend to avoid stored procedures. The debugging tools tend to be more primitive. Error reporting can be harder (vs your server's log file) and, to me at least, it just seems to add another language for no real gain.
There are cases where it can be useful, particularly when processing large amounts of data on the server and of course for database triggers that you can't do in code.
Other than that though, I tend to do everything in code and treat the database as a big dump of data rather than something I run code on.
Consider Who Needs Stored Procedures, Anyways?:
For modern databases and real world
usage scenarios, I believe a Stored
Procedure architecture has serious
downsides and little practical
benefit. Stored Procedures should be
considered database assembly language:
for use in only the most performance
critical situations.
and Why I do not use Stored Procedures:
The absolute worst thing you can do,
and it's horrifyingly common in the
Microsoft development world, is to
split related functionality between
sproc's and middle tier code.
Grrrrrrrr. You just make the code
brittle and you increase the
intellectual overhead of understanding
a system.
I said this in a comment, but I'm going to say it again here.
Security, Security, SECURITY.
When sql code is embedded in your application, you have to expose the underlying tables to direct access. This might sound okay at first. Until you get hit with some sql injection that scrambles all the varchar fields in your database.
Some people might say that they get around this by using magic quotes or some other way of properly escaping their embedded sql. The problem, though, is the one query a dev didn't escape correctly. Or, the dev that forgot to not allow code to be uploaded. Or, the web server that was cracked which allowed the attacker to upload code. Or,... you get the point. It's hard to cover all your bases.
My point is, all modern databases have security built in. You can simply deny direct table access (select, insert, update, and deletes) and force everything to go through your s'procs. By doing so generic attacks will no longer work. Instead the attacker would have to take the time to learn the intimate details of your system. This increases their "cost" in terms of time spent and stops drive by and worm attacks.
I know we can't secure ourselves against everything, but if you take the time to architect your apps so that the cost to crack it far outweighs the benefits then you are going to serious reduce your potential of data loss. That means taking advantage of all the security tools available to you.
Finally, as to the idea of not using s'procs because you might have to port to a different rdbms: First, most apps don't change database servers. Second, in the event that it's a real possibility, you have to code using ANSI sql anyway; which you can do in your procs. Third, you would have to reevaluate all of your sql code no matter what and it's a whole lot easier when that code is in one place. Fourth, all modern databases now support s'procs. Fifth, when using s'proc's you can custom tune your sql for the database it's running under to take advantage of that particular database's sql extensions.
Basically when you have to perform operations involving data that do not need to get out of the database. For example, you want to update one table with data from another, it makes little sense to get the data out and then back in if you can do it all in one single shot to the db.
Another situation where it may be acceptable to use stored procedures is when you are 100% sure you will never deploy your application to another database vendor. If you are an Oracle shop and you have lots of applications talking to the same database it may make sense to have stored procedures to make sure all of them talk to the db in a consistent manner.
Complicated database queries for me tend to end up as stored procs. Another thought to consider is that your database might be completely separate and distinct from the application. Lets say you run an Oracle DB and you essentially are building an API for other application developers at your organization to call into. You can hide the complicated stuff from them and provide a stored proc in its place.
A very simple example:
registerUser(username, password)
might end up running a few different queries (check if it exists, create entries in a preference table, etc) and you might want to encapsulate them.
Of course, different people will have different perspectives (a DBA versus a Programmer).
I used stored procs in 1 of 3 scenarios:
Speed
When speed is of the utmost importance, stored procedures provide an excellent method
Complexity
When I'm updating several tables and the code logic might change down the road, I can update the stored proc and avoid a recompile. Stored procedures are an excellent black box method for updating lots of data in a single stroke.
Transactions
When I'm working an insert, delete or update that spans multiple tables. I wrap the whole thing in a transaction. If there is an error, it's very easy to roll back the transaction and throw an error to avoid data corruption.
The bottom 2 are very do-able in code. However, stored procedures provide an black-box method of working when complex and transaction level operations are important. Otherwise, stick with code level database operations.
Security used to be one of the reasons. However, with LINQ and other ORMs out there, code level DAL operations are much more secure than they've been in the past. Stored procs ARE secure but so are ORMs like LINQ.
We use stored procedures for all of our reporting needs. They can usually retrieve the data faster and in a way that the report can just spit out directly instead of having to do any kind of calculations or similar.
We also will use stored procedures for complex or complicated queries we need to do that would be difficult to read if they were otherwise inside of our codebase.
It can also be very useful as a matter of encapsulation and in the philosophy of DRY. For instance I use stored functions for calculations inside a table that I need for several queries inside the code. This way I use the better performance as well as the ensuring that the calculation is always done the same way.
I would not use it for higher functionality or logic the should be in the business logic layer of an architecture, but focused on the model layer, where the functionality is clearly focused on the database design and possible flexibility of changing the database design without breaking the API to the other layers.
I tend to always use stored procedures. Personally, I find it makes everything easier to maintain. Then there is the security and performance considerations.
Just make sure you write clean, well laid out and well documented stored procedures.
When all the code is in a stored proc, it is far easier to refactor the database when needed. Changes to logic are far easier to push as well. It is also far far easier to performance tune and sooner or later performance tuning becomes necessary for most database applications.
From my experience, stored procedures can be very useful for building reporting databases/pipelines, however, I'd argue that you should avoid using stored procedures within applications as they can impede a team's velocity and any security risks of building queries within an application can be mitigated by the use of modern tooling/frameworks.
Why might we avoid it?
To avoid tight-coupling between applications and databases. If we use stored procedures, we won't be able to easily change our underlying database in the future because we'd have to either:
Migrate stored procedures from one database (e.g. DB2) to another (e.g. SQL Server) which could be painstakingly time-consuming or...
Migrate all the queries to the applications themselves (or potentially in a shared library)
Because code-first is a thing. There a several ORMs which can enable us to target any database and even manage the table schemas without ever needing to touch the database. ORMs such as Entity Framework or Dapper allow developers to focus on building features instead of writing stored procedures and wiring them up in the application.
It's yet another thing that developers need to learn in order to be productive. Instead, they can write the queries as part of the applications which makes the queries far simpler to understand, maintain, and modify by the developers who are building new features and/or fixing bugs.
Ultimately, it depends on what developers are most comfortable with.
If a developer has a heavy SQL background, they might go with Stored Procs.
If a developer has lots of app development experience, they might prefer queries in code. Personally, I think having queries in code can enable developers to move much faster and security concerns can be mitigated by ensuring teams are following best practices (e.g. parameterized queries, ORM). Stored procs aren't a "silver bullet" for system security.
Does the use of procedures still make sense in 202X?
Maybe in low level and rare scenarios or if we write code for a legacy companies with unfounded restrictions, stored procedure should be an option.
If entire logic is in the database, should I need a dba to change it?
No. In modern platforms, the requirement of a DBA to change the business logic is not an option.
Hot modification of stored procedures without dev or staging phases, area a crazy idea.
How easy is to maintain a procedure with dozens of lines, cursors and other low level database features vs a OOP objects in any modern language in which a junior developer is able to maintain?
This answers itself
Hide tables from my development team for security reasons sounds very crazy for me, in these times in which agility and well documentation are everything.
Modern development team with a modern database, should not worry about security. What's more, they need access to sandbox version of database to reduce the time of its deliverables.
With modern ORMs, ESBs, ETLs and the constant increase of cpu power, stored procedures are not an option anymore. Should I invest time and money in these tools, to create at final: one big stored procedure?
Of course, not.
On top of the speed and security considerations, I tend to stick as much in Stored Procedures as possible for ease of maintenance and alterations. If you put the logic in your application, and find later that sql logic has an error or needs to work differently in some manner, you have to recompile and redeploy the whole app in many cases (especially if it's a client side app such as WPF, Win-Forms, etc). If you keep the logic in the stored proc, all you have to do is update the proc and you never have to touch the application.
I agree that they should be used often and well.
The use case I think is extremely compelling and extremely useful is if you are taking in a lot of raw information that should be separated out into several tables, where the some of the data may have records that already exist and need to be connected by foreign key id, then you can just IF EXISTS checks and insert if it doesn't or return key if it does, which makes everything more uniform, succinct, and maintainable in the long run.
The only case where I would suggest against using them is if you are doing a lot of logic or number crunching between queries which is best done in the app server OR if you are working for a company where keeping all of the logic in the code is important for maintainability/understanding what is happening. If you have a git repository full of everything anyone would need and is easily understandable, that can be very valuable.
The stored procedures are a method of collecting operations that should be done together on database side, while still keeping them on database side.
This includes:
Populating several tables from one rowsource
Checking several tables against different business rules
Performing operations that cannot be efficiently performed using set-based approach
etc.
The main problem with stored procedures is that they are hard to maintain.
You, therefore, should make stored procedures as easy to maintain as all your other code.
I have an article on this in my blog:
Schema junk
I've had some very bad experiences with this.
I'm not opposed to stored procedures in their place, but gratuitous use of stored procedures can be very expensive.
First, stored procedures run on the database server. That means that if you have a multi-server environment with 50 webservers and one database server, instead of spreading workloads over 50 cheap machines, you load up one expensive one (since the database server is commonly built as a heavyweight server). And you're risking creating a single-point-of-failure.
Secondly, it's not very easy to write an application solely in stored procedures, although I ran into one that made a superhuman effort to try to. So you end up with something that's expensive to maintain: It's implemented in 2 different programming languages, and the source code is often not all in one place either, since stored procedures are definitively stored in the DBMS and not in a source archive. Assuming that someone ever managed/bothered o pull them out of the database server and source-archive them at all.
So aside from a fairly messy app architecture, you also limit the set of qualified chimpanzees who can maintain it, as multiple skills are required.
On the other hand, stored procedures are extremely useful, IF:
You need to maintain some sort of data integrity across multiple systems. That is, the stored logic doesn't belong to any single app, but you need consistent behavior from all participating apps. A certain amount of this is almost inevitable in modern-day apps in the form of foreign keys and triggers, but occasionally, major editing and validation may be warranted as well.
You need performance that can only be achieved by running logic on the database server itself and not as a client. But, as I said, when you do that, you're eating into the total system resources of the DBMS server. So it behooves you to ensure that if there are significant bits of the offending operation that CAN be offloaded onto clients, you can separate them out and leave the most critical stuff for the DBMS server.
A particular scenario you're likely to benefit involves the situation around the "(n+1)" scalability problem. Any kind of multidimensional/hierarchical situation is likely to involve this scenario.
Another scenario would involve use cases where it does some protocol when handling the tables (hint: defined steps which transactions are likely to be involved), this could benefit from locality of reference: Being in the server, queries might benefit. OTOH, you could supply a batch of statements directly into the server. Specially when you're on a XA environment and you have to access federated databases.
If you are talking business logic rather than just "Should I use sprocs in general" I would say you should put business logic in sprocs when you are carrying out large set based operations or any other time executing the logic would require a large number of calls to the db from the app.
It also depends on your audience. Is ease of installation and portability across DBMSs important to you?
If your program should be easy to install and easy to run on different database systems then you should stay away from stored procedures and also look out for non-portable SQL in your code.
Which is better? Or use and OR mapper with SP's? If you have a system with SP's already, is an OR mapper worth it?
I like ORM's because you don't have to reinvent the wheel. That being said, it completely depends on your application needs, development style and that of the team.
This question has already been covered Why is parameterized SQL generated by NHibernate just as fast as a stored procedure?
There is nothing good to be said about stored procedures. There were a necessity 10 years ago but every single benefit of using sprocs is no longer valid. The two most common arguments are regarding security and performance. The "sending stuff over the wire" crap doesn't hold either, I can certainly create a query dynamically to do everything on the server too. One thing the sproc proponents won't tell you is that it makes updates impossible if you are using column conflict resolution on a merge publication. Only DBAs who think they are the database overlord insist on sprocs because it makes their job look more impressive than it really is.
This has been discussed at length on previous questions.
What are the pros and cons to keeping SQL in Stored Procs versus Code
At my work, we mostly do line of business apps - contract work.
For this type of business, I'm a huge fan of ORM. About four years ago (when the ORM tools were less mature) we studied up on CSLA and rolled our own simplified ORM tool that we use in most of our applications,including some enterprise-class systems that have 100+ tables.
We estimate that this approach (which of course includes a lot of code generation) creates a time savings of up to 30% in our projects. Seriously, it's rediculous.
There is a small performance trade-off, but it's insubstantial as long as you have a decent understanding of software development. There are always exceptions that require flexibility.
For instance, extremely data-intensive batch operations should still be handled in specialized sprocs if possible. You probably don't want to send 100,000 huge records over the wire if you could do it in a sproc right on the database.
This is the type of problem that newbie devs run into whether they're using ORM or not. They just have to see the results and if they're competent, they will get it.
What we've seen in our web apps is that usually the most difficult to solve performance bottlenecks are no longer database-related even with ORM. Rather, tey're on the front-end (browser) due to bandwidth, AJAX overhead, etc. Even mid-range database servers are incredibly powerful these days.
Of course, other shops who work on much larger high-demand systems may have different experiences there. :)
Stored procedures hands down. OR Mappers are language specific, and often add graphic slowdowns.
Stored procedures means you're not limited by the language interface, and you can merely tack on new interfaces to the database in forwards compatible ways.
My personal opinion of OR Mappers is their existence highlights a design flaw in the popular structure of databases. Database developers should realize the tasks people are trying to achieve with complicated OR-Mappers and create server-side utilities that assist in performing this task.
OR Mappers also are epic targets of the "leaky abstraction" syndrome ( Joel On Software: Leaky Abstractions )
Where its quite easy to find things it just cant handle because of the abstraction layer not being psychic.
Stored procedures are better, in my view, because they can have an independent security configuration from the underlying tables.
This means you can allow specific operations without out allowing writes/reads to specific tables. It also limits the damage that people can do if they discover a SQL injection exploit.
Definitely ORMs. More flexible, more portable (generally they tend to have portability built in). In case of slowness you may want to use caching or hand-tuned SQL in hot spots.
Generally stored procedures have several problems with maintainability.
separate from application (so many changes have now to be made in two places)
generally harder to change
harder to put under version control
harder to make sure they're updated (deployment issues)
portability (already mentioned)
I personally have found that SP's tend to be faster performance wise, at least for the large data items that I execute on a regular basis. But I know many people that swear by OR tools and wouldn't do ANYTHING else.
I would argue that using an OR mapper will increase readability and maintainability of your applications source code, while using SP will increase the performance of the application.
They are not actually mutually exclusive, though to your point they usually are so.
The advantage of using Object Relational mapping is that you can swap out data sources. Not only database structure, but you could use any data source. With advent web services / Service-oriented architecture / ESB's, in a larger corporation, it would be wise to consider having a higher level separation of concerns than what you could get in stored procedures. However, in smaller companies and in application that will never use a different data source, then SP's can fit the bill fine. And one last point, it is not necessary to use an OR mapper to get the abstraction. My former team had great success by simply using an adapter model using Spring.NET to plug-in the data source.
# Kent Fredrick
My personal opinion of OR Mappers is their existence highlights a design flaw in the popular structure of databases"
I think you're talking about the difference between the relational model and object-oriented model. This is actually why we need ORMs, but the implementations of these models were done on purpose - it is not a design flow - it is just how things turned out to be historically.
Use stored procedures where you have identified a performance bottleneck. if you haven't identified a bottleneck, what are you doing with premature optimisation?
Use stored procedures where you are concerned about security access to a particular table.
Use stored procs when you have a SQL wizard who is prepared to sit and write complex queries that join together loads of tables in a legacy database- to do the things that are hard in an OR mapper.
Use the OR mapper for the other (at least) 80% of your database: where the selects and updates are so routine as to make access through stored procedures alone a pointless exercise in manual coding, and where updates are so infrequent that there is no performance cost. Use an OR mapper to automate the easy stuff.
Most OR mappers can talk to stored procs for the rest.
You should not use stored procs assuming that they're faster than a sql statement in a string, this is not necessarily the case in the last few versions of MS SQL server.
You do not need to use stored procs to thwart SQL injection attacks, there are other ways to do make sure that your query parameters are strongly typed and not just string-concatenated.
You don't need to use an OR mapper to get a POCO domain model, but it does help.
If you already have a data API that's exposed as sprocs, you'd need to justify a major architectural overhaul to go to ORM.
For a green-fields build, I'd evaluate several things:
If there's a dedicated DBA on the team, I'd lean to sprocs
If there's more than one application touching the same DB I'd lean to sprocs
If there's no possibility of database migration ever, I'd lean to sprocs
If I'm trying to implement MVCC in the DB, I'd lean to sprocs
If I'm deploying this as a product with potentially multiple backend dbs (MySql, MSSql, Oracle), I'd lean to ORM
If I'm on a tight deadline, I'd lean to ORM, since it's a faster way to create my domain model and keep it in sync with the data model (with appropriate tooling).
If I'm exposing the same domain model in multiple ways (web app, web service, RIA client), I'll lean to ORM as then data model is then hidden behind my ORM facade, making a robust domain model is more valuable to me.
I think performance is a bit of a red herring; hibernate seems to perform nearly as well or better than hand-coded SQL (due to it's caching tiers), and it's easy to write a bad query in your sproc either way.
The most important criteria are probably the team's skillset and long-term database portability needs.
Well the SP's are already there. It doesn't make sense to can them really. I guess does it make sense to use a mapper with SP's?
"I'm trying to drive in a nail. Should I use the heel of my shoe or a glass bottle?"
Both Stored Procedures and ORMs are difficult and annoying to use for a developer (though not necessarily for a DBA or architect, respectively), because they incur a start-up cost and higher maintenance cost that doesn't guarantee a pay-off.
Both will pay off well if the requirements aren't expected to change much over the lifespan of the system, but they will get in your way if you're building the system to discover the requirements in the first place.
Straight-coded SQL or quasi-ORM like LINQ and ActiveRecord is better for build-to-discover projects (which happen in the enterprise a lot more than the PR wants you to think).
Stored Procedures are better in a language-agnostic environment, or where fine-grained control over permissions is required. They're also better if your DBA has a better grasp of the requirements than your programmers.
Full-blown ORMs are better if you do Big Design Up Front, use lots of UML, want to abstract the database back-end, and your architect has a better grasp of the requirements than either your DBA or programmers.
And then there's option #4: Use all of them. A whole system is not usually just one program, and while many programs may talk to the same database, they could each use whatever method is appropriate both for the program's specific task, and for its level of maturity. That is: you start with straight-coded SQL or LINQ, then mature the program by refactoring in ORM and Stored Procedures where you see they make sense.
I've been using PostgreSQL a little bit lately, and one of the things that I think is cool is that you can use languages other than SQL for scripting functions and whatnot. But when is this actually useful?
For example, the documentation says that the main use for PL/Perl is that it's pretty good at text manipulation. But isn't that more of something that should be programmed into the application?
Secondly, is there any valid reason to use an untrusted language? It seems like making it so that any user can execute any operation would be a bad idea on a production system.
PS. Bonus points if someone can make PL/LOLCODE seem useful.
#Mike: this kind of thinking makes me nervous. I've heard to many times "this should be infinitely portable", but when the question is asked: do you actually foresee that there will be any porting? the answer is: no.
Sticking to the lowest common denominator can really hurt performance, as can the introduction of abstraction layers (ORM's, PHP PDO, etc). My opinion is:
Evaluate realistically if there is a need to support multiple RDBMS's. For example if you are writing an open source web application, chances are that you need to support MySQL and PostgreSQL at least (if not MSSQL and Oracle)
After the evaluation, make the most of the platform you decided upon
And BTW: you are mixing relational with non-relation databases (CouchDB is not a RDBMS comparable with Oracle for example), further exemplifying the point that the perceived need for portability is many times greatly overestimated.
"isn't that [text manipulation] more of something that should be programmed into the application?"
Usually, yes. The generally accepted "three-tier" application design for databases says that your logic should be in the middle tier, between the client and the database. However, sometimes you need some logic in a trigger or need to index on a function, requiring that some code be placed into the database. In that case all the usual "which language should I use?" questions come up.
If you only need a little logic, the most-portable language should probably be used (pl/pgSQL). If you need to do some serious programming though, you might be better off using a more expressive language (maybe pl/ruby). This will always be a judgment call.
"is there any valid reason to use an untrusted language?"
As above, yes. Again, putting direct file access (for example) into your middle tier is best when possible, but if you need to fire things off based on triggers (that might need access to data not available directly to your middle tier), then you need untrusted languages. It's not ideal, and should generally be avoided. And you definitely need to guard access to it.
These days, any "unique" or "cool" feature in a DBMS makes me incredibly nervous. I break out in a rash and have to stop work until the itching goes away.
I just hate to be locked in to a platform unnecessarily. Suppose you build a big chunk of your system in PL/Perl inside the database. Or in C# within SQL Server, or PL/SQL within Oracle, there are plenty of examples*.
Now you suddenly discover that your chosen platform doesn't scale. Or isn't fast enough. Or something. Worse, there's a new kid on the database block (something like MonetDB, CouchDB, Cache, say but much cooler) that would solve all your problems (even if your only problem, like mine, is having an uncool databse platform). And you can't switch to it without recoding half your application.
(*Admittedly, the paid-for products are to some extent seeking to lock you in by persuading you to use their unique features, which is not an accusation that can directly be levelled at the free providers, but the effect is the same).
So that's a rant on the first part of the question. Heart-felt, though.
is there any valid reason to use an
untrusted language? It seems like
making it so that any user can execute
any operation would be a bad idea
My goodness, yes it does! A sort of "Perl injection attack"? Almost worth doing it just to see what happens, I'd have thought.
For philosophical reasons outlined above I think I'll pass on the PL/LOLCODE challenge. Although I was somewhat amazed to discover it was a link to something extant.
From my perspective, I guess the answer is 'it depends'.
There is an argument that manipulation of the data belongs in the database layer, so that the business logic does not need to be overly concerned about how the manipulation happens, it just knows that it has.
Another very good reason to process data on the db layer is if the volume of data being crunched means that network bandwidth will become an issue. I once had to categorise very large amounts of data. Processing this in the application layer was severly restricted by the time required to transfer all the data across the network for processing.
I then wrote a binning algorithm in PL/pgSQL and it worked much faster.
Regarding untrusted languages, I heard a podcast from Josh Berkus (a postgres advocate) who discussed an application of postgresql that brought in data from MySQL as part of its processing, so that the communication itself was handled by the postgres server. I don't remember the full details, I think it was on the FLOSS Weekly podcast which is quite an interesting discussion of the history of PostGRESQL and some of the issues it is put to.
The untrusted versions of the procedural languages allow you to access I/O on the system.
This can come in handy if you need a trigger or something send a email or connect to a socket server to send a popup notification. There are tons of uses for this type of thing, and because of postgresql isolation levels you cans safely do things like this.
You can put checkpoints in the function so if the transaction fails the email or whatever won't go out. The nice thing about doing this is it removes the logic from the client and puts it on the server.
I think most additional languages are offered so that if you develop in that language on a regular basis, you can feel comfortable writing db functions, triggers, etc. The usefulness of these features is to provide a control over data as close to the data as possible.
An example of a useful stored procedure I recently wrote in an external language that would not have been possible in pl/sql is a version of 'df' which allowed SQL table generators to pick a tablespace with the most free space available at runtime.
I used plperlu, and it was relatively simple, although I had to be careful with data typing.