SQL Server Views, blessing or curse? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I once worked with an architect who banned the use of SQL views. His main reason was that views made it too easy for a thoughtless coder to needlessly involve joined tables which, if that coder tried harder, could be avoided altogether. Implicitly he was encouraging code reuse via copy-and-paste instead of encapsulation in views.
The database had nearly 600 tables and was highly normalised, so most of the useful SQL was necessarily verbose.
Several years later I can see at least one bad outcome from the ban - we have many hundreds of dense, lengthy stored procs that verge on unmaintainable.
In hindsight I would say it was a bad decision, but what are your experiences with SQL views? Have you found them bad for performance? Any other thoughts on when they are or are not appropriate?

There are some very good uses for views; I have used them a lot for tuning and for exposing less normalized sets of information, or for UNION-ing results from multiple selects into a single result set.
Obviously any programming tool can be used incorrectly, but I can't think of any times in my experience where a poorly tuned view has caused any kind of drawbacks from a performance standpoint, and the value they can provide by providing explicitly tuned selects and avoiding duplication of complex SQL code can be significant.
Incidentally, I have never been a fan of architectural "rules" that are based on keeping developers from hurting themselves. These rules often have unintended side-effects -- the last place I worked didn't allow using NULLs in the database, because developers might forget to check for null. This ended up forcing us to work around "1/1/1900" dates and integers defaulted to "0" in all the software built against the databases, and introducing a litany of bugs caused by devs working around places where NULL was the appropriate value.

You've answered your own question:
he was encouraging code reuse via copy-and-paste
Reuse the code by creating a view. If the view performs poorly, it will be much easier to track down than if you have the same poorly performing code in several places.

Not a big fan of views (Can't remember the last time I wrote one) but wouldn't ban them entirely either. If your database allows you to put indexes on the views and not just on the table, you can often improve performance a good bit which makes them better. If you are using views, make sure to look into indexing them.
I really only see the need for views for partitioning data and for extremely complex joins that are really critical to the application (thinking of financial reports here where starting from the same dataset for everything might be critical). I do know some reporting tools seem to prefer views over stored procs.
I am a big proponent of never returning more records or fields than you need in a specific instance and the overuse of views tends to make people return more fields (and in way too many cases, too many joins) than they need which wastes system resources.
I also tend to see that people who rely on views (not the developer of the view - the people who only use them) often don't understand the database very well (so they would get the joins wrong if not using the view) and that to me is critical to writing good code against the database. I want people to understand what they are asking the db to do, not rely on some magic black box of a view. That is all personal opinion of course, your mileage may vary.
Like BlaM I personally haven't found them easier to maintain than stored procs.
Edited in Oct 2010 to add:
Since I orginally wrote this, I have had occasion to work with a couple of databases designed by people who were addicted to using views. Even worse they used views that called views that called views (to the point where eventually we hit the limit of the number of tables that can be called). This was a performance nightmare. It took 8 minutes to get a simple count(*) of the records in one view and much longer to get data. If you use views, be very wary of using views that call other views. You will be building a system that will very probably not work under the normal performance load on production. In SQL Server you can only index views that do not call other views, so what ends up happening when you use views in a chain, is that the entire record set has to be built for each view and it is not until you get to the last one that the where clause criteria are applied. You may need to generate millions of records just to see three. You may join to the same table 6 times when you really only need to join to it once, you may return many many more columns than you need in the final results set.

My current database was completely awash with countless small tables of no more than 5 rows each. Well, I could count them but it was cluttered. These tables simply held constant type values (think enum) and could very easily be combined into one table. I then made views that simulated each of the tables I deleted to ensure backward compactability. Worked great.

One thing that hasn't been mentioned thus far is use of views to provide a logical picture of the data to end users for ad hoc reporting or similar.
This has two merits:
To allow the user to single "tables" containing the data they expect rather requiring relatively non technical users to work out potentially complex joins (because the database is normalised)
It provides a means to allow some degree of ah hoc access without exposing the data or the structure to the end users.
Even with non ad-hoc reporting its sometimes signicantly easier to provide a view to the reporting system that contains the relveant data, neatly separating production of data from presentation of same.

Like all power, views have its own dark side. However, you cannot blame views for somebody writing bad performing code. Moreover views can limit the exposure of some columns and provide extra security.

Views are good for ad-hoc queries, the kind that a DBA does behind the scenes when he/she needs quick access to data to see what's going on with the system.
But they can be bad for production code. Part of the reason is that it's sort of unpredictable what indexes you will need with a view, since the where clause can be different, and therefore hard to tune. Also, you are generally returning a lot more data than is actually necesary for the individual queries that are using the view. Each of these queries could be tightened up and tuned individually.
There are specific uses of views in cases of data partitioning that can be extremely useful, so I'm not saying they should avoided altogether. I'm just saying that if a view can be replaced by a few stored procedures, you will be better off without the view.

We use views for all of our simple data exports to csv files. This simplifies the process of writing a package and embedding the sql within the package which becomes cumbersome and hard to debug against.
Using views, we can execute a view and see exactly what was exported, no cruft or unknowns. It greatly helps in troubleshooting problems with improper data exports and hides any complex joins behind the view. Granted, we use a very old legacy system from a TERMS based system that exports to sql, so the joins are a little more complex than usual.

Some time ago I've tried to maintain code that used views built from views built from views... That was a pain in the a**, so I got a little allergic to views :)
I usually prefer working with tables directly, especially for web applications where speed is a main concern. When accessing tables directly you have the chance to tweak your SQL-Queries to achieve the best performance. "Precompiled"/cached working plans might be one advantage of views, but in many cases just-in-time compilation with all given parameters and where clauses in consideration will result in faster processing over all.
However that does not rule out views totally, if used adequately. For example you can use a view with the "users" table joined with the "users_status" table to get an textual explanation for each status - if you need it. However if you don't need the explanation: use the "users" table, not the view. As always: Use your brain!

Views have been helpful to us in their role for use by public web based applications that dip from a production database. Simplified security is the primary advantage we see since the table design in the database may combine sensitive and non-sensitive data within the same table. A stored procedure shares much of this advantage, but the view is read-only, has potential interop advantages, and is a less complex thing for junior people to implement.
This security abstraction advantage also applies when views are used for end-user ad-hoc queries; this would be less of an advantage if we had a proper, flattened, data warehouse representation of our data.

From an application stand point which uses an ORM, it's a lot harder to execute a custom query than doing a select on a discretely mapped type (eg, the view).
For example, if you need just 5 fields of a table that has many (say 30 or 40) an ORM framework will create an entity to represent the table.
That means that even though you only need a few properties of the entity, the select query generated by the ORM framework will bring the entire entity in its full glory. A view on the other hand, although also mapped to an entity with the ORM framework, will only bring the data you need.
Second, since ORM frameworks map entities to tables, relationships between entities are generated (and hydrated) on the client side, meaning that the query has to execute and return to the app before linking of those entities can happen at runtime within the app.
Some frameworks bypass that by returning the data from multiple linked entities in a giant select (with multiple joins), bringing in the columns of all related tables in one call. Internally the framework disassembles the giant result set and structures the logical presentation of the linked entities before returning those entities to the caller app.
Point being is that views are a life saver for apps using ORM. The alternative is to manually make db calls, and manually passing the resulting recordsets into usable entities/models.
While this approach is good and definitely produces a result, it has lots of negative facets. Manual code... is manual; hard to maintain, cumbersome in implementation, and causes devs to worry more about the specifics of the DB provider API vs the logical domain model. Not to mention that it increases time to production (its a lot more labourious) costs for development, maintenance, surface area of bugs, etc.
So for anyone saying views are bad, please consider the other side of things; The stuff the high and mighty DBA's most often have no clue about.

Let's see if I can come up with a lame analogy ...
"I don't need a phillips screwdriver. I carry a flat head and a grinder!"
Dismissing views out of hand will cause pain long term. For one, it's easier to debug and modify a single view definition than it is to ship modified code.

Views can also reduce the size of complex queries (in the same way stored procs can).
This can reduce network bandwith for very busy databases.

Related

Select * vs specific columns & loading object properties

I always thought SELECT * was bad and that you should always return only the columns you are going to use. One of the reasons for this is that the DB can return the result without hitting any tables if all the columns needed are in the index.
I have a factory class that loads the properties of a Product object. It loads all the properties everytime GetProduct is called etc.
Many of the pages won't be using all of the Product properties even though they will be loaded from the database because of the SELECT*.
Is there any design advice/guidelines on this?
The trade-off here is between squeaking out every last bit of potential performance versus code maintainability. There is no question that bringing back columns you won't use wastes some CPU cycles. The question becomes: how many? Then you have to consider what is more expensive, your wasted CPU cycles or your programmers' time for building and maintaining the code?
If you are working on a system with huge performance requirements then it may very well pay to optimize your ORM / factory code. On the other hand, if you're building a departmental line of business app and you've got scores or hundreds of ORM classes, maybe you are better off keeping it simple for the programmers (and the people who have to pay for them) and stop worrying about a few cycles. This becomes even more the case if you use a framework that scaffolds up most of your ORM code for you with code generation - like Entity Framework (or many others)...
If you are building your system without the use of any kind of code generating framework, and if your data access layer is pretty close to bare metal SQL then only bringing back what you need is good advice. If you are building an app that is going to be used by thousands or millions of people simultaneously, then by all means tune your SQL from the outset. If, on the other hand, you work in a shop that uses ORM frameworks and RAD or agile then writing dozens of SQLs is counter productive.
I'd definitely avoid SELECT *. Just retrieve the data you know you'll need. I'd prefer to write a dozen queries to the same table, where each one refers to just the few columns I need for a particular purpose, rather than write one query that retrieves all the columns and just use that everywhere.
Even if you know you need every column currently in a table, list each one explicitly. That way, if someone adds half a dozen more columns to the table in the future, all your old queries won't suddenly be retrieving more data than is needed.

Are modifiable join views a reasonable design choice?

To be clear, by modifiable join view I mean a view constructed from the joining of two or more tables that allows insert/update/delete actions that modify any/all of the component tables.
This may be a postgres specific question, not sure. I am also interested if other DBMSs have idiosyncratic features for modifiable join views, since as far as I can tell, they are not possible in standard SQL.
I'm working on a postgres schema, and some of my recent reading has suggested that it is possible to construct modifiable join views using instead rules (CREATE RULE ... DO INSTEAD ...). Modifiable join views seem desirable since it would allow for hiding strong normalization behind an interface, providing a mechanism for classic abstraction. Rules are the only option for implementation, since currently triggers cannot be set on views.
However, the first modifiable view I tried to design ran into problems, and I find out that many consider non-trivial rules to be harmful (see links in comments to this SO answer). Also, I can't find any examples of modifiable join views on the web.
Questions (Edit to put finer points on the questions):
Do you have any experience with modifiable join views and can you provide a concrete example with select/insert/delete/update ability?
Are they practical, i.e. can they be treated transparently without having to tiptoe around mines/black holes?
Are they ever a good design choice, in terms of functionality/effort ratio and maintainability?
Would greatly appreciate links to any examples/discussions on this topic. Thanks.
Yes, I have some experience with updatable views in general. I think they're practical in PostgreSQL. Like all design choices, they can be a good choice, and they can be a bad choice.
I find them particularly useful in dealing with supertype/subtype tables. I create one view for each subtype; the view joins the subtype to the supertype. Revoke permissions on the base tables, write rules for the view, and give client code access only to the views. All data manipulation done by client code then goes through the view and the rules defined on them.
I don't think rules are really different from any other feature in any other environment. And by environment, I mean C, C++, Java, Ruby, Python, Erlang, and BASIC, not just dbms environments.
Use the good features of a language. Avoid the bad ones.
"Don't use malloc()" is bad advice. "Always check the return value of malloc()" is good advice. "Never use rules" is bad advice. "Avoid using rules in ways that are known to have questionable behavior" is good advice. The rules you need for views on supertype/subtype tables are simple and easy to understand. They don't misbehave.
At the theoretical level, views provide logical data independence. But that's only possible if the views are updatable. (And many views should be updatable directly by the database engine, without any need of rules or triggers.)
I use them as a replacement for ORMs. I think as long as you do not run-a-muck sprinkling them everywhere through the database they can be easy enough to understand. I define a schema for an application and then whatever views are in that schema are the methods and operations of that app. The client code can be mostly automated after that since the views give the abstraction I need to write generic client code.
People point out that the rule rewrite is not a real table (but it is posing as one) which makes it possible to write things that will break. This is possible but I have yet to come across it yet. The idea is to hide the complexity in the rewrite and then only do simple deletes and update with no joins. If it turns out that a join is needed - it is time to rewrite the rule, not the top level query.
At the end, I find it a very compact way to write the database. All the ways of interfacing with it are written as rules. No connection should have access to a real table. Your business logic is very explicit. If a view does not have an UPDATE rule for it - it can not be updated period. Since you have written all this in the database level instead of the client level, it is not tied to a web framework or a particular language. This leads to a lot of flexibility in how you want to connect to the database. Imagine you used web framework, but as time goes on you need direct access to the database for another source. Direct access will also bypass all of ORM business rules you worked so hard on. With a rule writing interface you can expose, the interface without fear that the new connection will corrupt the data.
If people say you can really F UP a database with them - then sure - of course you can. But you can with everything else too. If people say you can not use them at all with out mucking things up, then I would disagree.
Two quick links:
Why using rules is bad idea
Triggers on views
My personal preference is to use views only for reading data, (virtually) never for inserting or updating. By essentially re-normalizing data (which sounds like what you are doing) in your database, you are likely creating a system that will be very difficult to test and maintain in the long term.
If at all possible, look at mapping your denormalized data back to a normal schema somewhere in your application code, and providing it to the database that way (to individual tables IMHO) in a single transaction.
I know in SQL Server if you update a view you must limit the change to only one table anyway which makes using views for updating useless in my mind as you have to know which fields go with which tables anyway.
If you want to abstract the information out and not have to worry about the database structure for inserts adn updates, an ORM mught do a better job for you than views.
I have never used modifiable views of any sort but as you are asking whether they are a "reasonable design choice", can I suggest an alternative design choice with many benefits where modifiable views are not needed: a Transactional API
Basically what this amounts to is:
Users have no access to tables and cannot issue insert, update, delete statements at all
Users have access to functions that represent well defined transactions - at the simplest level these may just do a single DML, but often would not. The important thing is that they map to transactions in the 'business' sense rather than in the 'database' sense
For querying, users have access to (non-modifiable) views
I do usually do views in the form of "last-valid-record" just hidding and tracking modifications (like a wiki)
The only drawback that I see to this is: then you use your view as a table, and you join it with anything, and and you use it on "wheres", and you insert records on it, and so on, but behinds you have made lot of performance lost compared to the same acctions against a real table (more bigger and more complex). I think it depends on how many people must understud de schema. Its true that some DBMS also admit to index the views, but I think you lose an important amount of performance anyway. Sorry about my english.

Should I be concerned that ORMs, by default, return all columns?

In my limited experience in working with ORMs (so far LLBL Gen Pro and Entity Framework 4), I've noticed that inherently, queries return data for all columns. I know NHibernate is another popular ORM, and I'm not sure that this applies with it or not, but I would assume it does.
Of course, I know there are workarounds:
Create a SQL view and create models and mappings on the view
Use a stored procedure and create models and mappings on the result set returned
I know that adhering to certain practices can help mitigate this:
Ensuring your row counts are reasonably limited when selecting data
Ensuring your tables aren't excessively wide (large number of columns and/or large data types)
So here are my questions:
Are the above practices sufficient, or should I still consider finding ways to limit the number of columns returned?
Are there other ways to limit returned columns other than the ones I listed above?
How do you typically approach this in your projects?
Thanks in advance.
UPDATE: This sort of stems from the notion that SELECT * is thought of as a bad practice. See this discussion.
One of the reasons to use an ORM of nearly any kind is to delay a lot of those lower-level concerns and focus on the business logic. As long as you keep your joins reasonable and your table widths sane, ORMs are designed to make it easy to get data in and out, and that requires having the entire row available.
Personally, I consider issues like this premature optimization until encountering a specific case that bogs down because of table width.
First of : great question, and about time someone asked this! :-)
Yes, the fact an ORM typically returns all columns for a database table is something you need to take into consideration when designing your systems. But as you've mentioned - there are ways around this.
The main fact for me is to be aware that this is what happens - either a SELECT * FROM dbo.YourTable, or (better) a SELECT (list of all columns) FROM dbo.YourTable.
This is not a problem when you really want the whole object and all its properties, and as long as you load a few rows, that's fine, too - the convenience beats the raw performance.
You might need to think about changing your database structures a little bit - things like:
maybe put large columns like BLOBs into separate tables with a 1:1 link to your base table - that way, a select on the parent tables doesn't grab all those large blobs of data
maybe put groups of columns that are optional, that might only show up in certain situations, into separate tables and link them - again, just to keep the base tables lean'n'mean
Also: avoid trying to "arm-wrestle" your ORM into doing bulk operations - that's just not their strong point.
And: keep an eye on performance, and try to pick an ORM that allows you to change certain operations into e.g. stored procedures - Entity Framework 4 allows this. So if the deletes are killing you - maybe you just write a Delete stored proc for that table and handle that operation differently.
The question here covers your options fairly well. Basically you're limited to hand-crafting the HQL/SQL. It's something you want to do if you run into scalability problems, but if you do in my experience it can have a very large positive impact. In particular, it saves a lot of disk and network IO, so your scalability can take a big jump. Not something to do right away though: analyse then optimise.
Are there other ways to limit returned columns other than the ones I listed above?
NHibernate lets you add projections to your queries so you wouldn't need to use views or procs just to limit your columns.
For me this has only been an issue if the tables has LOTS of columns > 30 or if the column had alot of data for example a over 5000 character in a field.
The approach I have used is to just map another object to the existing table but with only the fields I need. So for a search that populates a table with 100 rows I would have a
MyObjectLite, but when I click to view the Details of that Row I would call a GetById and return a MyObject that has all the columns.
Another approach is to use custom SQL, Stroed procs but I only think you should go down this path if you REALLY need the performance gain and have users complaining. SO unless there is a performance problem do not waste your time trying to fix a problem that does not exist.
You can limit number of returned columns by using Projection and Transformers.AliasToBean and DTO here how it looks in Criteria API:
.SetProjection(Projections.ProjectionList()
.Add(Projections.Property("Id"), "Id")
.Add(Projections.Property("PackageName"), "Caption"))
.SetResultTransformer(Transformers.AliasToBean(typeof(PackageNameDTO)));
In LLBLGen Pro, you can return Typed Lists which not only allow you to define which fields are returned but also allow you to join data so you can pull a custom list of fields from multiple tables.
Overall, I agree that for most situations, this is premature optimization.
One of the big advantages of using LLBLGen and other ORMs as well (I just feel confident speaking about LLBLGen because I have used it since its inception) is that the performance of the data access has been optimized by folks who understand the issues better than your average bear.
Whenever they figure out a way to further speed up their code, you get those changes "for free" just by re-generating your data layer or by installing a new dll.
Unless you consider yourself an expert at writing data access code, ORMs probably improve most developers efficacy and accuracy.

Pros and cons of putting logic in SQL? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
At a new job, I've just been exposed to the concept of putting logic into SQL statements.
In MySQL, a dumb example would be like this:
SELECT
P.LastName, IF(P.LastName='Baldwin','Michael','Bruce') AS FirstName
FROM
University.PhilosophyProfessors P
// This is like a ternary operator; if the condition is true, it returns
// the first value; else the second value. So if a professor's last name
// is 'Baldwin', we will get their first name as "Michael"; otherwise, "Bruce"**
For a more realistic example, maybe you're deciding whether a salesperson qualifies for a bonus. You could grab various sales numbers and do some calculations in your SQL query, and return true / false as a column value called "qualifies."
Previously, I would have gotten all the sales data back from the query, then done the calculation in my application code.
To me, this seems better, because if necessary, I can walk through the application logic step-by-step with a debugger, but whatever the database is doing is a black box to me. But I'm a junior developer, so I don't know what's normal.
What are the pros and cons of having the database server do some of your calculations / logic?
**Code example based on Monty Python sketch.
This way SQL becomes part of your domain model. It's one more (and not necessarily obvious) place where domain knowledge is implemented. Such leaks result in tighter coupling between business logic / application code and database, what usually is a bad idea.
One exception is views, report queries etc. But these usually are so isolated that it's obvious what role they play.
One of the most persuasive reasons to push logic out to the database is to minimise traffic. In the example given, there is little gain, since you are fetching the same amount of data whether the logic is in the query or in your app.
If you want to fetch only users with a first name of Michael, then it makes more sense to implement the logic on the server. Actually, in this simple example, it doesn't make much difference, since you could specify users who's lastname is Baldwin. But consider a more interesting problem, whereby you give each user a "popularity" score based on how common their first and last names are, and you want to fetch the 10 most "popular" users. Calculating "popularity" in the app would mean that you have to fetch every single user before ranking, sorting and choosing them locally. Calculating it on the server means you can fetch just 10 rows across the wire.
There aren't a lot of absolute pros and cons to this argument, so the answer is 'it depends.' Some scenarios with different conditions that affect this decision might be:
Client-server app
One example of a place where it might be appropriate to do this is an older 4GL or rich client application where all database operations were done through stored procedure based update, insert, delete sprocs. In this case the gist of the architecture was to have the sprocs act as the main interface for the database and all business logic relating to particular entities lived in the one place.
This type of architecture is somewhat unfashionable these days but at one point it was considered to be the best way to do it. Many VB, Oracle Forms, Informix 4GL and other client-server apps of the era were done like this and it actually works fairly well.
It's not without its drawbacks, however - SQL is not particularly good at abstraction, so it's quite easy to wind up with fairly obtuse SQL code that presents a maintenance issue through being hard to understand and not as modular as one might like.
Is it still relevant today? Quite often a rich client is the right platform for an application and there's certainly plenty of new development going on with Winforms and Swing. We do have good open-source ORMs today where a 1995 vintage Oracle Forms app might not have had the option of using this type of technology. However, the decision to use an ORM is certainly not a black and white one - Fowler's Patterns of Enterprise Application Architecture does quite a good job of running through a range of data access strategies and discussing their relative merits.
Three tier app with rich object model
This type of app takes the opposite approach, and places all of the business logic in the middle tier model object layer with a relatively thin database layer (or perhaps an off-the-shelf mechanism like an ORM). In this case you are attempting to place all the application logic in the middle-tier. The data access layer has relatively little intelligence, except perhaps for a handful of stored procedured needed to get around limits of an ORM.
In this case, SQL based business logic is kept to a minimum as the main repository of application logic is the middle-tier.
Overhight batch processes
If you have to do a periodic run to pick out records that match some complex criteria and do something with them it may be appropriate to implement this as a stored procedure. For something that may have to go over a significant portion of a decent sized database a sproc based approch is probably going to be the only reasonably performant way to do this sort of thing.
In this case SQL may well be the appropriate way to do this, although traditional 3GLs (particularly COBOL) were designed specifically for this type of processing. In really high volume environments (particularly mainframes) doing this type of processing with flat or VSAM files outside a database may be the fastest way to do it. In addition, some jobs may be inherently record-oriented and procedural, or may be much more transparent and maintanable if implemented in this way.
To paraphrase Ed Post, 'you can write COBOL in any language' - although you might not want to. If you want to keep it in the database, use SQL, but it's certainly not the only game in town.
Reporting
The nature of reporting tools tends to dictate the means of encoding business logic. Most are designed to work with SQL based data sources so the nature of the tool forces the choice on you.
Other domains
Some applications like ETL processing may be a good fit for SQL. ETL tools start to get unwiedly if the transformation gets too complex, so you may want to go for a stored procedure based architecture. Mixing Queries and transformations across extraction, ETL processing and stored-proc based processing can lead to a transformation process that is hard to test and troubleshoot.
Where you have a significant portion of your logic in sprocs it may be better to put all of the logic in this as it gives you a relatively homogeneous and modular code base. In fact I have it on fairly good authority that around half of all data warehouse projects in the banking and insurance sectors are done this way as an explicit design decision - for precisely this reason.
Many times the answer to this type of question is going to depend a great deal on deployment approach. Where it makes the most sense to place your logic depends on what you'll need to be able to get access to when making changes.
In the case of web applications that aren't compiled, it can be easier to deal with changes to a page or file than it is to work with queries (depending on query complexity, programming backgrounds / expertise, etc). In these kinds of situations, logic in the scripting language is typically ok and make make it easier to revise later.
In the case of desktop applications that require more effort to modify, placing this kind of logic in the database where it can be adjusted without requiring a recompilation of the application may benefit you. If there was a decision made that people used to qualify for bonuses at 20k, but now must make 25k, it'd be much easier to adjust that on the SQL Server than to recompile your accounting application for all of your users, for example.
I'm a strong advocate of putting as much logic as possible directly into the database. That means incorporating it in views and stored procedures. I believe that most follows the DRY principle.
For example, consider a table with FirstName and LastName columns, and an application that frequently makes use of a FullName field. You have three choices:
Query first and last name and compute the full name in application code.
Query first, last, and (first || last) in your application's SQL whenever you query the table.
Define a view CustomerExt that includes the first and last columns, and a computed full name column and then query against that view, rather than the customer table.
I believe option 3 is clearly correct. Consider the addition of a MiddleInitial field to the table and the full name computation. Using option 3, you simply need to replace the view and every application across your company will instantly use the new format for FullName. The view still makes the base columns available for those instances in which you need to do some special formatting, but for the standard instance everything works "automatically".
That's a simple case, but the principle is the same for more complex situations. Perform application- or company-wide data logic directly in the database and you do not need to concern yourself with keeping different applications up to date.
The answer depends on your expertise and your familiarity with the technologies involved. Also, if you're a technical manager, it depends on your analysis of the skills of the people working on your team and whom you intend on hiring / keeping on staff to support, extend and maintain the application in future.
If you are not literate and proficient in the database , (as you are not) then stick with doing it in code. If otoh, you are literate and proficient in database coding (as you should be), then there is nothing wrong (and a lot right) abput doing it in the database.
Two other considerations that might influence your decision are whether the logic is of such a complex nature that doing it in database code would be inordinately more complex or more abstract than in code, and second, if the process involved requires data from outside the database (from some other source) In either of these scenarios I would consider moving the logic to a code module.
The fact that you can step through the code in your IDE more easily is really the only advantage to your post-processing solution. Doing the logic in the database server reduces the sizes of result sets, often drastically, which leads to less network traffic. It also allows the query optimizer to get a much better picture of what you really want done, again often allowing better performance.
Therefore I would nearly always recommend SQL logic. If you treat a database as a mere dumb store, it will return the favor by behaving dumb, and depending on the situation, that can absolutely kill your performance - if not today, possibly next year when things have taken off...
That particular first example is a bad idea. Per-row functions do not scale well as the table gets bigger. In fact, a (likely) better way to do it would be to index LastName and use something like:
SELECT P.LastName, 'Michael' AS FirstName
FROM University.PhilosophyProfessors P
WHERE P.LastName = 'Baldwin'
UNION ALL SELECT P.LastName, 'Bruce' AS FirstName
FROM University.PhilosophyProfessors P
WHERE P.LastName <> 'Baldwin'
On databases where data are read more often than written (and that's most of them), these sorts of calculations should be done at write time such as using an insert/update trigger to populate a real FirstName field.
Databases should be used for storing and retrieving data, not doing massive non-databasey calculations that will slow down everything.
One big pro: a query may be all you can work with. Reports have been mentioned: many reporting tools or reporting plugins to existing programs only allow users to make their own queries (the results of which they will display).
If you cannot alter the code (because it isn't yours), you may yet be able to alter a query. And in some cases (data migration), you'll be writing queries to do migration as well.
I like to distinguish data vs business rules, and push the data rules into the stored procs as much as possible. There is not always a hard and fast distinction between the two, but in your example of calculating sales bonuses, the formula itself might be a business rule but the work of gathering and aggregating the various figures used in the formula is a data rule.
Sometimes, though, it depends on the deployment model and change control procedures. If the sales formula changes frequently and deployment of the business layer code is cumbersome, then tweaking just one function/stored proc in the database would be a great solution.
I'm a big fan of elegant database queries because the code is closer to the data and SQL works very well. But such queries, whether they're text in you app, generated by an OR mapper or stored in the database are harder to test, especially in the cloud, because you need a database to run against.
Database is exactly what it's called. DATABASE.
You should not mix the business logic with data layer.
Keep it separate as any close coupling between data and business makes impossible to follow best standards in programming.
I was working recently on a project where all logic was in MS SQL. Horrible idea, that back-fired after few years (energy company), no easy way to scale-out, no easy way to follow up CI/CD, Agile or code repos. Very difficult to co-work, very slow and very inefficient.
Company basically was reaching hardware limits in order to make it work (they've spent £100k on SSD SAN), while you could reach the same performance with C# for business and keep the database for data, with perhaps 3-4 cheap servers, that could easily scale-out.
Horrible, horrible idea. Guess what ? Company went under, as one time SQL server has reached it's potential (sometimes some queries were running for hours (very well written, but SQL is not for business logic. End of story)) when one time failed to bill all DD customers and basically didn't took the monthly payment that they needed to survive till next month (millions of pounds).

SQL Server 2005: Wrapping Tables by Views - Pros and Cons

Background
I am working on a legacy small-business automation system (inventory, sales, procurement, etc.) that has a single database hosted by SQL Server 2005 and a bunch of client applications. The main client (used by all users) is an MS Access 2003 application (ADP), and other clients include various VB/VBA applications like Excel add-ins and command-line utilities.
In addition to 60 or so tables (mostly in 3NF), the database contains about 200 views, about 170 UDFs (mostly scalar and table-valued inline ones), and about 50 stored procedures. As you might have guessed, some portion of so-called "business logic" is encapsulated in this mass of T-SQL code (and thus is shared by all clients).
Overall, the system's code (including the T-SQL code) is not very well organized and is very refactoring-resistant, so to speak. In particular, schemata of most of the tables cry for all kinds of refactorings, small (like column renamings) and large (like normalization).
FWIW, I have pretty long and decent application development experience (C/C++, Java, VB, and whatnot), but I am not a DBA. So, if the question will look silly to you, now you know why it is so. :-)
Question
While thinking about refactoring all this mess (in a peacemeal fashion of course), I've come up with the following idea:
For each table, create a "wrapper" view that (a) has all the columns that the table has; and (b) in some cases, has some additional computed columns based on the table's "real" columns.
A typical (albeit simplistic) example of such additional computed column would be sale price of a product derived from the product's regular price and discount.
Reorganize all the code (both T-SQL and VB/VBA client code) so that only the "wrapper" views refer to tables directly.
So, for example, even if an application or a stored procedure needed to insert/update/delete records from a table, they'd do that against the corresponding "table wrapper" view, not against the table directly.
So, essentially this is about isolating all the tables by views from the rest of the system.
This approach seems to provide a lot of benefits, especially from maintainability viewpoint. For example:
When a table column is to be renamed, it can be done without rewriting all the affected client code at once.
It is easier to implement derived attributes (easier than using computed columns).
You can effectively have aliases for column names.
Obviously, there must be some price for all these benefits, but I am not sure that I am seeing all the catches lurking out there.
Did anybody try this approach in practice? What are the major pitfalls?
One obvious disadvantage is the cost of maintaining "wrapper" views in sync with their corresponding tables (a new column in a table has to be added to a view too; a column deleted from a table has to be deleted from the view too; etc.). But this price seems to be small and fair for making the overall codebase more resilient.
Does anyone know any other, stronger drawbacks?
For example, usage of all those "wrapper" views instead of tables is very likely to have some adverse performance impact, but is this impact going to be substantial enough to worry about it? Also, while using ADODB, it is very easy to get a recordset that is not updateable even when it is based just on a few joined tables; so, are the "wrapper" views going to make things substantially worse? And so on, and so forth...
Any comments (especially shared real experience) would be greatly appreciated.
Thank you!
P.S. I stepped on the following old article that discusses the idea of "wrapper" views:
The Big View Myth
The article advises to avoid the approach described above. But... I do not really see any good reasons against this idea in the article. Quite the contrary, in its list of good reasons to create a view, almost each item is exactly why it is so tempting to create a "wrapper" view for each and every table (especially in a legacy system, as a part of refactoring process).
The article is really old (1999), so whatever reasons were good then may be no longer good now (and vice versa). It would be really interesting to hear from someone who considered or even tried this idea recently, with the latest versions of SQL Server and MS Access...
When designing a database, I prefer the following:
no direct table access by the code (but is ok from stored procedures and views and functions)
a base view for each table that includes all columns
an extended view for each table that includes lookup columns (types, statuses, etc.)
stored procedures for all updates
functions for any complex queries
this allows the DBA to work directly with the table (to add columns, clean things up, inject data, etc.) without disturbing the code base, and it insulates the code base from any changes made to the table (temporary or otherwise)
there may be performance penalties for doing things this way, but so far they have not been significant - and the benefits of the layer of insulation have been life-savers several times
You won't notice any performance impact for one-table views; SQL Server will use the underlying table when building the execution plans for any code using those views. I recommend you schema-bind those views, to avoid accidentally changing the underlying table without changing the view (think of the poor next guy.)
When a table column is to be renamed
In my experience, this rarely happens. Adding columns, removing columns, changing indexes and changing data types are the usual alter table scripts that you'll run.
It is easier to implement derived attributes (easier than using computed columns).
I would dispute that. What's the difference between putting the calculation in a column definition and putting it in a view definition? Also, you'll see a performance hit for moving it into a view instead of a computed column. The only real advantage is that changing the calculation is easier in a view than by altering a table (due to indexes and data pages.)
You can effectively have aliases for column names.
That's the real reason to have views; aliasing tables and columns, and combining multiple tables. Best practice in my past few jobs has been to use views where I needed to denormalise the data (lookups and such, as you've already pointed out.)
As usual, the most truthful response to a DBA question is "it depends" - on your situation, skillset, etc. In your case, refactoring "everything" is going to break all the apps anyways. If you do fix the base tables correctly, the indirection you're trying to get from your views won't be required, and will only double your schema maintenance for any future changes. I'd say skip the wrapper views, fix the tables and stored procs (which provide enough information hiding already), and you'll be fine.
I agree with Steven's comment--primarily because you are using Access. It's extremely important to keep the pros/cons of Access in focus when re-designing this database. I've been there, done that with the Access front-end/SQL Server back-end (although it wasn't an ADP project).
I would add that views are nice for ensuring that data is not changed outside of the Access forms in the project. The downside is that stored procedures be required for all updates--if you don't already have those, they'd have to be created too.