Related
I have to make a report with 168 rows. Most of them are sequential data, but there are summation rows for which I need to build helper tables.
Therefore I need to build like 45-50 queries, most of them Append Queries.
Is there a way to minimize the number of queries and develop a large report with 168 rows?
Should I use code?
Just this last year I created a complicated, multi-part and multi-page report with graphs, summations, running averages, trends, "pivot-tables", etc. I did not count how many "rows" of data, but here are some things I did to manage the many queries:
Most important lesson learned: After much optimization and attempts to consolidate and reuse queries and temporary tables, it still turns out that there is no set of "magic few" queries that will return the data you need. Even if you reduce the number of SQL queries from 45 to 35 (which would be impressive in many cases), there are still many queries that you need to manage in an intelligent way. The point is to worry more about writing manageable queries and good infrastructure, rather than making the focus on reducing the number. (If your process is similar, you'll inevitably have to add more queries and more details later anyway.)
Union queries indeed have their place and are sometimes necessary, but simply combining queries to "reduce the number" can have negative consequences. 1) Union queries cannot be built or visualized using the Design View. I consider myself a "real coder", but I still appreciate the ability to use UI components when I can. Design View offers various useful syntax and datatype checks. 2) It is often useful in debugging and optimization to be able to run queries individually. 3) Unions do not improve efficiency and might actually slow down queries when duplicate removal and sorting are not necessary. 4) I have experienced certain perfectly correct queries that result in errors when combined in Unions. I haven't learned how to predict this behavior so it's almost not worth mentioning... except to not be fooled into thinking that the individual queries are somehow flawed. (There are usually workarounds.)
Create all related report queries and temporary tables in a separate Access database and link to the main database. In other words, create a separate reporting front-end if possible. Not only can this keep the source database cleaner, it can make it more efficient (highly dependent on number of users and how they're sharing the database).
Name queries using a consistent pattern. I tried using numbered queries with some success. I personally find that descriptive names are more useful than short, cryptic names. Much cut and pasting becomes necessary however.
VBA code or macros can be better than individual saved queries.
I rarely use complicated macros, so most of these tips are relevant to VBA code, but I won't argue against macros because they offer at least some similar benefits. It's also possible without much work to create a useful "dashboard" form that makes VBA code click-n-run similar to macros.
Comments can be included adjacent to the SQL. This can be invaluable in outlining ugly SQL. For example, it can be worth explaining why you choose a LEFT JOIN with extra WHERE criteria instead of an INNER JOIN, especially to prevent "helpful" coworkers (or yourself) from rewriting a query just to find they failed to consider all the original contexts.
An entire sequence of query texts and execution can be traced and debugged live. If error handling is coded appropriately, you can create custom logs and specialized handling of errors. SQL text can be edited and rerun without stopping the entire process.
Query parameters can be passed to queries without awkward UI prompts. Parameters can be used to code proper queries with user input (i.e. avoid SQL injection) and can reduce number of similar queries that simply have different input or criteria but are otherwise identical.
Multiple queries can be wrapped in a transactions and all committed or rolled back together! Not sure whether macros support this.
You can either move the SQL to VBA, to a macro, or if they're all appending to one table, make a large union subquery. All will reach that goal. For usability, I often go for the macro, since it's click to run. Just SetWarnings first, and then chain RunSQL statements.
The UNION query is also elegant solution if applicable.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
At a new job, I've just been exposed to the concept of putting logic into SQL statements.
In MySQL, a dumb example would be like this:
SELECT
P.LastName, IF(P.LastName='Baldwin','Michael','Bruce') AS FirstName
FROM
University.PhilosophyProfessors P
// This is like a ternary operator; if the condition is true, it returns
// the first value; else the second value. So if a professor's last name
// is 'Baldwin', we will get their first name as "Michael"; otherwise, "Bruce"**
For a more realistic example, maybe you're deciding whether a salesperson qualifies for a bonus. You could grab various sales numbers and do some calculations in your SQL query, and return true / false as a column value called "qualifies."
Previously, I would have gotten all the sales data back from the query, then done the calculation in my application code.
To me, this seems better, because if necessary, I can walk through the application logic step-by-step with a debugger, but whatever the database is doing is a black box to me. But I'm a junior developer, so I don't know what's normal.
What are the pros and cons of having the database server do some of your calculations / logic?
**Code example based on Monty Python sketch.
This way SQL becomes part of your domain model. It's one more (and not necessarily obvious) place where domain knowledge is implemented. Such leaks result in tighter coupling between business logic / application code and database, what usually is a bad idea.
One exception is views, report queries etc. But these usually are so isolated that it's obvious what role they play.
One of the most persuasive reasons to push logic out to the database is to minimise traffic. In the example given, there is little gain, since you are fetching the same amount of data whether the logic is in the query or in your app.
If you want to fetch only users with a first name of Michael, then it makes more sense to implement the logic on the server. Actually, in this simple example, it doesn't make much difference, since you could specify users who's lastname is Baldwin. But consider a more interesting problem, whereby you give each user a "popularity" score based on how common their first and last names are, and you want to fetch the 10 most "popular" users. Calculating "popularity" in the app would mean that you have to fetch every single user before ranking, sorting and choosing them locally. Calculating it on the server means you can fetch just 10 rows across the wire.
There aren't a lot of absolute pros and cons to this argument, so the answer is 'it depends.' Some scenarios with different conditions that affect this decision might be:
Client-server app
One example of a place where it might be appropriate to do this is an older 4GL or rich client application where all database operations were done through stored procedure based update, insert, delete sprocs. In this case the gist of the architecture was to have the sprocs act as the main interface for the database and all business logic relating to particular entities lived in the one place.
This type of architecture is somewhat unfashionable these days but at one point it was considered to be the best way to do it. Many VB, Oracle Forms, Informix 4GL and other client-server apps of the era were done like this and it actually works fairly well.
It's not without its drawbacks, however - SQL is not particularly good at abstraction, so it's quite easy to wind up with fairly obtuse SQL code that presents a maintenance issue through being hard to understand and not as modular as one might like.
Is it still relevant today? Quite often a rich client is the right platform for an application and there's certainly plenty of new development going on with Winforms and Swing. We do have good open-source ORMs today where a 1995 vintage Oracle Forms app might not have had the option of using this type of technology. However, the decision to use an ORM is certainly not a black and white one - Fowler's Patterns of Enterprise Application Architecture does quite a good job of running through a range of data access strategies and discussing their relative merits.
Three tier app with rich object model
This type of app takes the opposite approach, and places all of the business logic in the middle tier model object layer with a relatively thin database layer (or perhaps an off-the-shelf mechanism like an ORM). In this case you are attempting to place all the application logic in the middle-tier. The data access layer has relatively little intelligence, except perhaps for a handful of stored procedured needed to get around limits of an ORM.
In this case, SQL based business logic is kept to a minimum as the main repository of application logic is the middle-tier.
Overhight batch processes
If you have to do a periodic run to pick out records that match some complex criteria and do something with them it may be appropriate to implement this as a stored procedure. For something that may have to go over a significant portion of a decent sized database a sproc based approch is probably going to be the only reasonably performant way to do this sort of thing.
In this case SQL may well be the appropriate way to do this, although traditional 3GLs (particularly COBOL) were designed specifically for this type of processing. In really high volume environments (particularly mainframes) doing this type of processing with flat or VSAM files outside a database may be the fastest way to do it. In addition, some jobs may be inherently record-oriented and procedural, or may be much more transparent and maintanable if implemented in this way.
To paraphrase Ed Post, 'you can write COBOL in any language' - although you might not want to. If you want to keep it in the database, use SQL, but it's certainly not the only game in town.
Reporting
The nature of reporting tools tends to dictate the means of encoding business logic. Most are designed to work with SQL based data sources so the nature of the tool forces the choice on you.
Other domains
Some applications like ETL processing may be a good fit for SQL. ETL tools start to get unwiedly if the transformation gets too complex, so you may want to go for a stored procedure based architecture. Mixing Queries and transformations across extraction, ETL processing and stored-proc based processing can lead to a transformation process that is hard to test and troubleshoot.
Where you have a significant portion of your logic in sprocs it may be better to put all of the logic in this as it gives you a relatively homogeneous and modular code base. In fact I have it on fairly good authority that around half of all data warehouse projects in the banking and insurance sectors are done this way as an explicit design decision - for precisely this reason.
Many times the answer to this type of question is going to depend a great deal on deployment approach. Where it makes the most sense to place your logic depends on what you'll need to be able to get access to when making changes.
In the case of web applications that aren't compiled, it can be easier to deal with changes to a page or file than it is to work with queries (depending on query complexity, programming backgrounds / expertise, etc). In these kinds of situations, logic in the scripting language is typically ok and make make it easier to revise later.
In the case of desktop applications that require more effort to modify, placing this kind of logic in the database where it can be adjusted without requiring a recompilation of the application may benefit you. If there was a decision made that people used to qualify for bonuses at 20k, but now must make 25k, it'd be much easier to adjust that on the SQL Server than to recompile your accounting application for all of your users, for example.
I'm a strong advocate of putting as much logic as possible directly into the database. That means incorporating it in views and stored procedures. I believe that most follows the DRY principle.
For example, consider a table with FirstName and LastName columns, and an application that frequently makes use of a FullName field. You have three choices:
Query first and last name and compute the full name in application code.
Query first, last, and (first || last) in your application's SQL whenever you query the table.
Define a view CustomerExt that includes the first and last columns, and a computed full name column and then query against that view, rather than the customer table.
I believe option 3 is clearly correct. Consider the addition of a MiddleInitial field to the table and the full name computation. Using option 3, you simply need to replace the view and every application across your company will instantly use the new format for FullName. The view still makes the base columns available for those instances in which you need to do some special formatting, but for the standard instance everything works "automatically".
That's a simple case, but the principle is the same for more complex situations. Perform application- or company-wide data logic directly in the database and you do not need to concern yourself with keeping different applications up to date.
The answer depends on your expertise and your familiarity with the technologies involved. Also, if you're a technical manager, it depends on your analysis of the skills of the people working on your team and whom you intend on hiring / keeping on staff to support, extend and maintain the application in future.
If you are not literate and proficient in the database , (as you are not) then stick with doing it in code. If otoh, you are literate and proficient in database coding (as you should be), then there is nothing wrong (and a lot right) abput doing it in the database.
Two other considerations that might influence your decision are whether the logic is of such a complex nature that doing it in database code would be inordinately more complex or more abstract than in code, and second, if the process involved requires data from outside the database (from some other source) In either of these scenarios I would consider moving the logic to a code module.
The fact that you can step through the code in your IDE more easily is really the only advantage to your post-processing solution. Doing the logic in the database server reduces the sizes of result sets, often drastically, which leads to less network traffic. It also allows the query optimizer to get a much better picture of what you really want done, again often allowing better performance.
Therefore I would nearly always recommend SQL logic. If you treat a database as a mere dumb store, it will return the favor by behaving dumb, and depending on the situation, that can absolutely kill your performance - if not today, possibly next year when things have taken off...
That particular first example is a bad idea. Per-row functions do not scale well as the table gets bigger. In fact, a (likely) better way to do it would be to index LastName and use something like:
SELECT P.LastName, 'Michael' AS FirstName
FROM University.PhilosophyProfessors P
WHERE P.LastName = 'Baldwin'
UNION ALL SELECT P.LastName, 'Bruce' AS FirstName
FROM University.PhilosophyProfessors P
WHERE P.LastName <> 'Baldwin'
On databases where data are read more often than written (and that's most of them), these sorts of calculations should be done at write time such as using an insert/update trigger to populate a real FirstName field.
Databases should be used for storing and retrieving data, not doing massive non-databasey calculations that will slow down everything.
One big pro: a query may be all you can work with. Reports have been mentioned: many reporting tools or reporting plugins to existing programs only allow users to make their own queries (the results of which they will display).
If you cannot alter the code (because it isn't yours), you may yet be able to alter a query. And in some cases (data migration), you'll be writing queries to do migration as well.
I like to distinguish data vs business rules, and push the data rules into the stored procs as much as possible. There is not always a hard and fast distinction between the two, but in your example of calculating sales bonuses, the formula itself might be a business rule but the work of gathering and aggregating the various figures used in the formula is a data rule.
Sometimes, though, it depends on the deployment model and change control procedures. If the sales formula changes frequently and deployment of the business layer code is cumbersome, then tweaking just one function/stored proc in the database would be a great solution.
I'm a big fan of elegant database queries because the code is closer to the data and SQL works very well. But such queries, whether they're text in you app, generated by an OR mapper or stored in the database are harder to test, especially in the cloud, because you need a database to run against.
Database is exactly what it's called. DATABASE.
You should not mix the business logic with data layer.
Keep it separate as any close coupling between data and business makes impossible to follow best standards in programming.
I was working recently on a project where all logic was in MS SQL. Horrible idea, that back-fired after few years (energy company), no easy way to scale-out, no easy way to follow up CI/CD, Agile or code repos. Very difficult to co-work, very slow and very inefficient.
Company basically was reaching hardware limits in order to make it work (they've spent £100k on SSD SAN), while you could reach the same performance with C# for business and keep the database for data, with perhaps 3-4 cheap servers, that could easily scale-out.
Horrible, horrible idea. Guess what ? Company went under, as one time SQL server has reached it's potential (sometimes some queries were running for hours (very well written, but SQL is not for business logic. End of story)) when one time failed to bill all DD customers and basically didn't took the monthly payment that they needed to survive till next month (millions of pounds).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm not quite sure stackoverflow is a place for such a general question, but let's give it a try.
Being exposed to the need of storing application data somewhere, I've always used MySQL or sqlite, just because it's always done like that. As it seems like the whole world is using these databases (most of all software products, frameworks, etc), it is rather hard for a beginning developer like me to start thinking about whether this is a good solution or not.
Ok, say we have some object-oriented logic in our application, and objects are related to each other somehow. We need to map this logic to the storage logic, so relations between database objects are required too. This leads us to using relational database, and I'm ok with that - to put it simple, our database table rows sometimes will need to have references to other tables' rows. But why use SQL language for interaction with such a database?
SQL query is a text message. I can understand this is cool for actually understanding what it does, but isn't it silly to use text table and column names for a part of application that no one ever seen after deploynment? If you had to write a data storage from scratch, you would have never used this kind of solution. Personally, I would have used some 'compiled db query' bytecode, that would be assembled once inside a client application and passed to the database. And it surely would name tables and colons by id numbers, not ascii-strings. In the case of changes in table structure those byte queries could be recompiled according to new db schema, stored in XML or something like that.
What are the problems of my idea? Is there any reason for me not to write it myself and to use SQL database instead?
EDIT To make my question more clear. Most of answers claim that SQL, being a text query, helps developers better understand the query itself and debug it more easily. Personally, I haven't seen people writing SQL queries by hand for a while. Everyone I know, including me, is using ORM. This situation, in which we build up a new level of abstraction to hide SQL, leads to thinking if we need SQL or not. I would be very grateful if you could give some examples in which SQL is used without ORM purposely, and why.
EDIT2 SQL is an interface between a human and a database. The question is why do we have to use it for application/database interaction? I still ask for examples of human beings writing/debugging SQL.
Everyone I know, including me, is using ORM
Strange. Everyone I know, including me, still writes most of the SQL by hand. You typically end up with tighter, more high performance queries than you do with a generated solution. And, depending on your industry and application, this speed does matter. Sometimes a lot. yeah, I'll sometimes use LINQ for a quick-n-dirty where I don't really care what the resulting SQL looks like, but thus far nothing automated beats hand-tuned SQL for when performance against a large database in a high-load environment really matters.
If all you need to do is store some application data somewhere, then a general purpose RDBMS or even SQLite might be overkill. Serializing your objects and writing them to a file might be simpler in some cases. An advantage to SQLite is that if you have a lot of this kind of information, it is all contained in one file. A disadvantage is that it is more difficult to read it. For example, if you serialize you data to YAML, you can read the file with any text editor or shell.
Personally, I would have used some
'compiled db query' bytecode, that
would be assembled once inside a
client application and passed to the
database.
This is how some database APIs work. Check out static SQL and prepared statements.
Is there any reason for me not to
write it myself and to use SQL
database instead?
If you need a lot of features, at some point it will be easier to use an existing RDMBS then to write your own database from scratch. If you don't need many features, a simpler solution may be wiser.
The whole point of database products is to avoid writing the database layer for every new program. Yes, a modern RDMBS might not always be a perfect fit for every project. This is because they were designed to be very general, so in practice, you will always get additional features you don't need. That doesn't mean it is better to have a custom solution. The glove doesn't always need to be a perfect fit.
UPDATE:
But why use SQL language for
interaction with such a database?
Good question.
The answer to that may be found in the original paper describing the relational model A Relational Model of Data for Large Shared Data Banks, by E. F. Codd, published by IBM in 1970. This paper describes the problems with the existing database technologies of the time, and explains why the relational model is superior.
The reason for using the relational model, and thus a logical query language like SQL, is data independence.
Data independence is defined in the paper as:
"... the independence of application programs and terminal activities from the growth in data types and changes in data representations."
Before the relational model, the dominate technology for databases was referred to as the network model. In this model, the programmer had to know the on-disk structure of the data and traverse the tree or graph manually. The relational model allows one to write a query against the conceptual or logical scheme that is independent of the physical representation of the data on disk. This separation of logical scheme from the physical schema is why we use the relational model. For a more on this issue, here are some slides from a database class. In the relational model, we use logic based query languages like SQL to retrieve data.
Codd's paper goes into more detail about the benefits of the relational model. Give it a read.
SQL is a query language that is easy to type into a computer in contrast with the query languages typically used in a research papers. Research papers generally use relation algebra or relational calculus to write queries.
In summary, we use SQL because we happen to use the relational model for our databases.
If you understand the relational model, it is not hard to see why SQL is the way it is. So basically, you need to study the relation model and database internals more in-depth to really understand why we use SQL. It may be a bit of a mystery otherwise.
UPDATE 2:
SQL is an interface between a human
and a database. The question is why do
we have to use it for
application/database interaction? I
still ask for examples of human beings
writing/debugging SQL.
Because the database is a relational database, it only understands relational query languages. Internally it uses a relational algebra like language for specifying queries which it then turns into a query plan. So, we write our query in a form we can understand (SQL), the DB takes our SQL query and turns it into its internal query language. Then it takes the query and tries to find a "query plan" for executing the query. Then it executes the query plan and returns the result.
At some point, we must encode our query in a format that the database understands. The database only knows how to convert SQL to its internal representation, that is why there is always SQL at some point in the chain. It cannot be avoided.
When you use ORM, your just adding a layer on top of the SQL. The SQL is still there, its just hidden. If you have a higher-level layer for translating your request into SQL, then you don't need to write SQL directly which is beneficial in some cases. Some times we do not have such a layer that is capable of doing the kinds of queries we need, so we must use SQL.
Given the fact that you used MySQL and SQLite, I understand your point of view completely. Most DBMS have features that would require some of the programming from your side, while you get it from database for free:
Indexes - you can store large amounts of data and still be able to filter and search very quickly because of indexes. Of course, you could implement you own indexing, but why reinvent the wheel
data integrity - using database features like cascading foreign keys can ensure data integrity across the system. You only need to declare relationship between data, and system takes care of the rest. Of course, once more, you could implement constraints in code, but it's more work. Consider, for example, deletion, where you would have to write code in object's destructor to track all dependent objects and act accordingly
ability to have multiple applications written in different programming languages, working on different operating systems, some even distributed across the network - all using the same data stored in a common database
dead easy implementation of observer pattern via triggers. There are many cases where only some data depends on some other data and it does not affect UI aspect of application. Ensuring consistency can be very tricky or require a lot of programming. Of course, you could implement trigger-like behavior with objects but it requires more programming than simple SQL definition
There are some good answers here. I'll attempt to add my two cents.
I like SQL, I can think in it pretty easily. The queries produced by layers on top of the DB (like ORM frameworks) are usually hideous. They'll select tons of extra stuff, join in things you don't need, etc.; all because they don't know that you only want a small part of the object in this code. When you need high performance, you'll often end up going in and using at least some custom SQL queries in an ORM system just to speed up a few bottlenecks.
Why SQL? As others have said, it's easy for humans. It makes a good lowest common denominator. Any language can make SQL and call command line clients if necessary, and they is pretty much always a good library.
Is parsing out the SQL inefficient? Somewhat. The grammar is pretty structured, so there aren't tons of ambiguities that would make the parser's job really hard. The real thing is that the overhead of parsing out SQL is basically nothing.
Let's say you run a query like "SELECT x FROM table WHERE id = 3", and then do it again with 4, then 5, over and over. In that case, the parsing overhead may exist. That's why you have prepared statements (as others have mentioned). The server parses the query once, and can swap in the 3 and 4 and 5 without having to reparse everything.
But that's the trivial case. In real life, your system may join 6 tables and have to pull hundreds of thousands of records (if not more). It may be a query that you let run on a database cluster for hours, because that's the best way to do things in your case. Even with a query that takes only a minute or two to execute, the time to parse the query is essentially free compared to pulling records off disk and doing sorting/aggregation/etc. The overhead of sending the ext "LEFT OUTER JOIN ON" is only a few bytes compared to sending special encoded byte 0x3F. But when your result set is 30 MB (let alone gigs+), those few extra bytes are worthless compared to not having to mess with some special query compiler object.
Many people use SQL on small databases. The biggest one I interact with is only a few dozen gigs. SQL is used on everything from tiny files (like little SQLite DBs may be) up to terabyte size Oracle clusters. Considering it's power, it's actually a surprisingly simple and small command set.
It's an ubiquitous standard. Pretty much every programming language out there has a way to access SQL databases. Try that with a proprietary binary protocol.
Everyone knows it. You can find experts easily, new developers will usually understand it to some degree without requiring training
SQL is very closely tied to the relational model, which has been thoroughly explored in regard to optimization and scalability. But it still frequently requires manual tweaking (index creation, query structure, etc.), which is relatively easy due to the textual interface.
But why use SQL language for interaction with such a database?
I think it's for the same reason that you use a human-readable (source code) language for interaction with the compiler.
Personally, I would have used some 'compiled db query' bytecode, that would be assembled once inside a client application and passed to the database.
This is an existing (optional) feature of databases, called "stored procedures".
Edit:
I would be very grateful if you could give some examples in which SQL is used without ORM purposely, and why
When I implemented my own ORM, I implemented the ORM framework using ADO.NET: and using ADO.NET includes using SQL statements in its implementation.
After all the edits and comments, the main point of your question appears to be : why is the nature of SQL closer to being a human/database interface than to being an application/database interface ?
And the very simple answer to that question is : because that is exactly what it was originally intended to be.
The predecessors of SQL (QUEL being presumably the most important one) were intended to be exactly that : a QUERY language, i.e. one that didn't have any of INSERT, UPDATE, DELETE.
Moreover, it was intended to be a query language that could be used by any user, provided that user was aware of the logical structure of the database, and obviously knew how to express that logical structure in the query language he was using.
The original ideas behind QUEL/SQL were that a database was built using "just any mechanism conceivable", that the "real" database could be really just anything (e.g. one single gigantic XML file - allthough 'XML' was not considered a valid option at the time), and that there would be "some kind of machinery" that understood how to transform the actual structure of that 'just anything' into the logical relational structure as it was perceived by the SQL user.
The fact that in order to actually achieve that, the underlying structures are required to lend themselves to "viewing them relationally", was not understood as well in those days as it is now.
Yes, it is annoying to have to write SQL statements to store and retrieve objects.
That's why Microsoft have added things like LINQ (language integrated query) into C# and VB.NET to make it possible to query databases using objects and methods instead of strings.
Most other languages have something similar with varying levels of success depending on the abilities of that language.
On the other hand, it is useful to know how SQL works and I think it is a mistake to shield yourself entirely from it. If you use the database without thinking you can write extremely inefficient queries and index the database incorrectly. But once you understand how to use SQL correctly and have tuned your database, you have a very powerful tried-and-tested tool available for finding exactly the data you need extremely quickly.
My biggest reason for SQL is Ad-hoc reporting. That report your business users want but don't know that they need it yet.
SQL is an interface between a human
and a database. The question is why do
we have to use it for
application/database interaction? I
still ask for examples of human beings
writing/debugging SQL.
I use sqlite a lot right from the simplest of tasks (like logging my firewall logs directly to a sqlite database) to more complex analytic and debugging tasks in my day-to-day research. Laying out my data in tables and writing SQL queries to munge them in interesting ways seems to be the most natural thing to me in these situations.
On your point about why it is still used as an interface between application/database, this is my simple reasoning:
There is about 3-4 decades of
serious research in that area
starting in 1970 with Codd's seminal
paper on Relational Algebra.
Relational Algebra forms the
mathematical basis to SQL (and other
QLs), although SQL does not
completely follow the relational
model.
The "text" form of the language
(aside from being easily
understandable to humans) is also
easily parsable by machines (say
using a grammar parser like like
lex) and is easily convertable to whatever "bytecode" using any number of optimizations.
I am not sure if doing this in any
other way would have yielded
compelling benefits in the generic cases. Otherwise it
would have been probably discovered
and adopted in the 3 decades of
research. SQL probably provides the
best tradeoffs when bridging the
divide between humans/databases and
applications/databases.
The question that then becomes interesting to ask is, "What are the real benefits of doing SQL in any other "non-text" way?" Will google for this now:)
SQL is a common interface used by the DBMS platform - the entire point of the interface is that all database operations can be specified in SQL without needing supplementary API calls. This means that there is a common interface across all clients of the system - application software, reports and ad-hoc query tools.
Secondly, SQL gets more and more useful as queries get more complex. Try using LINQ to specify a 12-way join a with three conditions based on existential predicates and a condition based on an aggregate calculated in a subquery. This sort of thing is fairly comprehensible in SQL but unlikely to be possible in an ORM.
In many cases an ORM will do 95% of what you want - most of the queries issued by applications are simple CRUD operations that an ORM or other generic database interface mechanism can handle easily. Some operations are best done using custom SQL code.
However, ORMs are not the be-all and end-all of database interfacing. Fowler's Patterns of Enterprise Application Architecture has quite a good section on other types of database access strategy with some discussion of the merits of each.
There are often good reasons not to use an ORM as the primary database interface layer. An example of a good one is that platform database libraries like ADO.Net often do a good enough job and integrate nicely with the rest of the environment. You might find that the gain from using some other interface doesn't really outweigh the benefits from the integration.
However, the final reason that you can't really ignore SQL is that you are ultimately working with a database if you are doing a database application. There are many, many WTF stories about screw-ups in commercial application code done by people who didn't understand databases properly. Poorly thought-out database code can cause trouble in so many ways, and blithely thinking that you don't need to understand how the DBMS works is an act of Hubris that is bound to come and bite you some day. Worse yet, it will come and bite some other poor schmoe who inherits your code.
While I see your point, SQL's query language has a place, especially in large applications with a lot of data. And to point out the obvious, if the language wasn't there, you couldn't call it SQL (Structured Query Language). The benefit of having SQL over the method you described is SQL is generally very readable, though some really push the limits on their queries.
I whole heartly agree with Mark Byers, you should not shield yourself from SQL. Any developer can write SQL, but to really make your application perform well with SQL interaction, understanding the language is a must.
If everything was precompiled with bytecode as you described, I'd hate to be the one to have to debug the application after the original developer left (or even after not seeing the code for 6 months).
I think the premise of the question is incorrect. That SQL can be represented as text is immaterial. Most modern databases would only compile queries once and cache them anyway, so you already have effectively a 'compiled bytecode'. And there's no reason this couldn't happen client-wise though I'm not sure if anyone's done it.
You said SQL is a text message, well I think of him as a messenger, and, as we know, don't shoot the messenger. The real issue is that relations are not a good enough way of organising real world data. SQL is just lipstick on the pig.
If the first part you seem to refer to what is usually called the Object - relational mapping impedance. There are already a lot of frameworks to alleviate that problem. There are tradeofs as well. Some things will be easier, others will get more complex, but in the general case they work well if you can afford the extra layer.
In the second part you seem to complain about SQL being text (it uses strings instead of ids, etc)... SQL is a query language. Any language (computer or otherwise) that is meant to be read or written by humans is text oriented for that matter. Assembly, C, PHP, you name it. Why? Because, well... it does make sense, doesn't it?
If you want precompiled queries, you already have stored procedures. Prepared statements are also compiled once on the fly, IIRC. Most (if not all) db drivers talk to the database server using a binary protocol anyway.
yes, text is a bit inefficient. But actually getting the data is a lot more costly, so the text based sql is reasonably insignificant.
SQL was created to provide an interface to make ad hoc queries against a relational database.
Generally, most relational databases understand some form of SQL.
Object-oriented databases exist, and (presumably) use objects to do their querying... but as I understand it, OO databases have a lot more overheard, and relational databases work just fine.
Relational Databases also allow you to operate in a "disconnected" state. Once you have the information you asked for, you can close the database connection. With an OO database, you either need to return all objects related to the current one (and the ones they're related to... and the... etc...) or reopen the connection to retrieve new objects as they are accessed.
In addition to SQL, you also have ORMs (object-relational mappings) that map objects to SQL and back. There are quite a few of them, including LINQ (.NET), the MS Entity Framework (.NET), Hibernate (Java), SQLAlchemy (Python), ActiveRecord (Ruby), Class::DBI (Perl), etc...
A database language is useful because it provides a logical model for your data independent of any applications that use it. SQL has a lot of shortcomings however, not the least being that its integration with other languages is poor, type support is about 30 years behind the rest of the industry and it has never been a truly relational language anyway.
SQL has survived mostly because the database market has been and remains dominated by the three mega-vendors who have a vested interest in protecting their investment. That's changing and SQL's days are probably numbered but the model that will finally replace it probably hasn't arrived yet - although there are plenty of contenders around these days.
I don't think most people are getting your question, though I think it's very clear. Unfortunately I don't have the "correct" answer. I would guess it's a combination of several things:
Semi-arbitrary decisions when it was designed such as ease of use, not needing a SQL compiler (or IDE), portability, etc.
It happened to catch on well (probably due to similar reasons)
And now due to historical reasons (compatibility, well known, proven, etc.) continues to be used.
I don't think most companies have bothered with another solution because it works well, isn't much of a bottleneck, it's a standard, blah, blah..
One of the Unix design principles can be said thusly, "Write programs to handle text streams, because that is a universal interface.".
And that, I believe, is why we typically use SQL instead of some 'byte-SQL' that only has a compilation interface. Even if we did have a byte-SQL, someone would write a "Text SQL", and the loop would be complete.
Also, MySQL and SQLite are less full-featured than, say, MSSQL and Oracle SQL. So you're still in the low end of the SQL pool.
Actually there are a few non-SQL database (like Objectivity, Oracle Berkeley DB, etc.) products came but non of them succeeded. In future if someone finds intuitive alternative for SQL, that will answer your question.
There are a lot of non relational database systems. Here are just a few:
Memcached
Tokyo Cabinet
As far as finding a relational database that doesn't use SQL as its primary interface, I think you won't find it. Reason: SQL is a great way to talk about relations. I can't figure out why that's a big deal to you: if you don't like SQL, put an abstraction over it (like an ORM) so you don't have to worry about it. Let the abstraction worry about it. It gets you to the same place.
However, the problem your'e really mentioning here is the object-relation disconnect - the problem is with the relation itself. Objects and relational-tuples don't always lend themselves to be a 1-1 relationship, which is the reason why a developer can frustrated with a database. The solution to that is to use a different database type.
Because often, you cannot be sure that (citing you) "no one ever seen after deployment". Knowing that there is an easy interface for reporting and for dataset level querying is a good path for evolution of your app.
You're right, that there are other solutions that may be valid in some situations: XML, plain text files, OODB...
But having a set of common interfaces (like ODBC) is a huge plus for the life of data.
I think the reason might be the search/find/grab algorithms the sql laungage is connected to do. Remember that sql has been developed for 40 years - and the goal has been both preformence wise and user firendly wise.
Ask yourself what the best way of finding 2 attibutes is. Now why investigating that each time you would want to do something that includes that each time you develope your application. Assuming the main goal is the developing of your application when developing an application.
An application has similarities with other applications, a database has similarities with other databases. So there should be a "best way" of these to interact, logically.
Also ask yourself how you would develop a better console only application that does not use sql laungage. If you cannot do that I think you need to develope a new kind of GUI that are even more fundamentally easier to use than with a console - to develope things from it. And that might actually be possible. But still most development of applications is based around console and typing.
Then when it comes to laungage I don´t think you can make a much more fundamentally easier text laungage than sql. And remember that each word of anything is inseperatly connected to its meaning - if you remove the meaning the word cannot be used - if you remove the word you cannot communicate the meaning. You have nothing to describe it with (And maybe you cannot even think it beacuse it woulden´t be connected to anything else you have thought before...).
So basically the best possible algorithms for database manipulation are assigned to words - if you remove these words you will have to assign these manipulations something else - and what would that be?
i think you can use ORM
if and only if you know the basic of sql.
else the result there isn't the best
Is there such a thing as too many stored procedures?
I know there is not a limit to the number you can have but is this any performance or architectural reason not to create hundreds, thousands??
To me the biggest limitation to that hundreds or thousands store procedure is maintainability. Even though that is not a direct performance hit, it should be a consideration. That is an architectural stand point, you have to plan not just for the initial development of the application, but future changes and maintenance.
That being said you should design/create as many as your application requires. Although with Hibernate, NHibernate, .NET LINQ I would try to keep as much store procedures logic in the code, and only put it in the database when speed is a factor.
Yes you can.
If you have more than zero, you have too many
Do you mean besides the fact that you have to maintain all of them when you make a change to the database? That's the big reason for me. It's better to have fewer ones that can handle many scenarios than hundreds that can only handle one.
I would just mention that with all such things maintenance becomes an issue. Who knows what they all are and what purpose they serve? What happens years down the way when they are just artifacts of a legacy system that no one can recall what they are for but are scared to mess with?
I think the main thing is the question of not is it possible, but should it be done. Thats just one thought anyway.
I have just under 200 in a commercial SQL Server 2005 product of mine, which is likely to increase by about another 10-20 in the next few days for some new reports.
Where possible I write 'subroutine' sprocs, so that anytime I find myself putting 3 or 4 identical statements together in more than a couple of sprocs, it's time to turn those few statements into a subroutine, if you see what I mean.
I don't tend to use the sprocs to perform all the business logic as such, I just prefer to have sprocs doing anything that could be seen as 'transactional' - so where as my client code (in Delphi but whatever) might do the odd insert or update itself, as soon as something requires a couple of things to be updated or inserted 'at once', it's time for a sproc.
I have a simple, crude naming convention to assist in the readability (and in maintenance!)
The product's code name is 'Rachel', so we have
RP_whatever - these are general sprocs that update/insert data,
- or return small result sets
RXP_whatever - these are subroutine sprocs that server as 'functions'
- or workers to the RP_ type procs
REP_whatever - these are sprocs that simply act as glorified views almost
- they don't alter data, they return potentially complex result sets
- for reporting purposes, etc
XX_whatever - these are internal test/development/maintenance sprocs
- that the client application (or the local dba) would not normally use
the naming is arbitrary, it's just to help distinguish from the sp_ prefix that SQL Server uses.
I guess if I found I had 400-500 sprocs in the database I might become concerned, but a few hundred isn't a problem at all as long as you have a system for identifying what kind of activity each sproc is responsible for. I'd rather chase down schema changes in a few hundred sprocs (where the SQL Server tools can help you find dependencies etc) than try to chase down schema changes in my high-level programming language.
I suppose the deeper question here is- where else should all of that code go? Views can be used to provide convenient access to data through standard SQL. More powerful application languages and libraries can be used to generate SQL. Stored procedures tend to obscure the nature of the data and introduce an unnecessary layer of abstraction, complexity, and maintenance to a system.
Given the power of object relational mapping tools to generate basic SQL, any system that is overly dependent on stored procedures is going to be less efficient to develop. With ORM, there is simply a lot less code to write.
Hell yes, you can have too many. I worked on a system a couple of year ago that had something like 10,000 procs. It was insane. The people who wrote the system didn't really know how to program but they did know how to right badly structured procs, so they put almost all of the application logic in the procs. Some of the procs ran for thousands of line. Managing the mess was a nightmare.
Too many (and I can't draw you a specific line in the sand) is probably an indicator of poor design. Besides, as others have pointed out, there are better ways to achieve a granular database interface without resorting to massive numbers of procs.
My question is why aren't you using some sort of middleware platform if you're talking about hundreds or thousands of stored procedures?
Call me old fashioned but I thought the database should just hold the data and the program(s) should be the one making the business logic decisions.
in Oracle database you can use Packages to group related procedures together.
a few of those and your namespace would free up.
Too much of any particular thing probably means you're doing it wrong. Too many config files, too many buttns on the screen ...
Can't speak for other RDBMSs but an Oracle application that uses PL/SQL should be logically bundling procedures and functions into packages to prevent cascading invalidations and manage code better.
To me, using lots of stored procedures seems to lead you toward something equivalent to the PHP API. Thousands of global functions which may have nothing to do with eachother all grouped together. The only way to relate between them is to have some naming convention where you prefix each function with a module name, similar to the mysql_ functions in PHP. I think this is very hard to maintain, and very hard to to keep everything consistent. I think stored procedures work well for things that really need to take place on the server. A stored procedure for a simple select query, or even a select query with a join is probably going to far. Only use stored procedures where you actually require advanced logic to be handled on the database server.
As others have said, it really comes down to the management aspect. After a while, it turns into finding the proverbial needle in the haystack. That's even more so when there's poor naming standards in place...
No, not that I know of. However, you should have a good naming convention for them. For example, starting them with usp_ is something that a lot of poeple like to do (fastr than starting with sp_).
Maybe your reporting sprocs should be usp_reporting_, and your bussiness objects should be usp_bussiness_, etc. As long s you can manage them, there shouldn't be a problem with having a lot of them.
If they get too big you might want to split the database up into multiple databases. This might make more logical sense and can help with database size and such.
Yep, you can have toooooo many stored procedures.
Naming conventions can really help with this. For instance, if you have a naming convention where you always have the table name and InsUpd or Get, it's pretty easy to find the right stored procedure when you need it. If you don't have some kind of standard naming convention it would be really easy to come up with two (or more) procedures that do almost exactly the same thing.
It would be easy to find a couple of the following without having a standardized naming convention... usp_GetCustomersOrder, CustomerOrderGet_sp, GetOrdersByCustomer_sp, spOrderDetailFetch, GetCustOrderInfo, etc.
Another thing that starts happening when you have a lot of stored procedures is that you'll have some stored procedures that are never used and some that are just rarely used... If you don't have some way of tracking stored procedure usage you'll either end up with a lot of unused procedures... or worse, getting rid of one you think is never used and finding out afterwards that it's only used once a year or once a quarter. :(
Jeff
Not while you have sane naming conventions, up-to-date documentation and enough unit tests to keep them organized.
My first big project was developed with an Object Relational Mapper that abstracted the database so we did everything object oriented and it was easier to mantain and fix bugs and it was especially easy to make changes since all the data access and business logic was C# code, however, when it came to do complex stuff the system felt slow so we had to work around those things or re-architect the project, especially when the ORM had to do complex joins in the database.
Im now working on a different company that has a motto of "our applications are just frontends of the database" and it has its advantages and disadvantages. We have really fast applications since all of the data operations are done on stored procedures, this is using mainly SQL Server 2005, but, when it comes to make a change, or a fix to the software its harder because you have to change both, the C# code and the SQL stored procedures so its like working twice, contrary to when I used an ORM, you dont have refactoring or strongly typed objects in sql, Management Studio and other tools help a lot but normally you spend more time going this way.
So I would say that deppending on the needs of the project, if you are not going to keep really complex business data than maybe you could even avoid using stored procedures at all and use an ORM that makes developer life's easier. If you are concerned about performance and need to save all of the resources than you should use stored procedures, of course, the quantity deppends uppon the architecture that you design, so for exmaple if you always have stored procedures for CRUD operations you would need one for inserts, one for update, one for deletion, one for selecting and one for listing since these are the most common operations, if you have 100 business objects, multiply these 4 by that and you get 400 stored procedures just to manage the most basic operations on your objects so, yes, they could be too many.
If you have a lot of stored procedures, you'll find you are tying yourself to one database - some of them may not easily be transferred.
Tying yourself to one database isn't good design.
Moreover, if you have business logic on the database and in a business layer then maintenance becomes a problem.
i think as long as you name them uspXXX and not spXXX the lookup by name will be direct, so no downside - though you might wonder about generalizing the procs if you have thousands...
You have to put all that code somewhere.
If you are going to have stored procedures at all (I personally do favour them, but many do not), then you might as well use them as much as you need to. At least all the database logic is in one place. The downside is that there isn't typically much structure to a bucket full of stored procs.
Your call! :)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I once worked with an architect who banned the use of SQL views. His main reason was that views made it too easy for a thoughtless coder to needlessly involve joined tables which, if that coder tried harder, could be avoided altogether. Implicitly he was encouraging code reuse via copy-and-paste instead of encapsulation in views.
The database had nearly 600 tables and was highly normalised, so most of the useful SQL was necessarily verbose.
Several years later I can see at least one bad outcome from the ban - we have many hundreds of dense, lengthy stored procs that verge on unmaintainable.
In hindsight I would say it was a bad decision, but what are your experiences with SQL views? Have you found them bad for performance? Any other thoughts on when they are or are not appropriate?
There are some very good uses for views; I have used them a lot for tuning and for exposing less normalized sets of information, or for UNION-ing results from multiple selects into a single result set.
Obviously any programming tool can be used incorrectly, but I can't think of any times in my experience where a poorly tuned view has caused any kind of drawbacks from a performance standpoint, and the value they can provide by providing explicitly tuned selects and avoiding duplication of complex SQL code can be significant.
Incidentally, I have never been a fan of architectural "rules" that are based on keeping developers from hurting themselves. These rules often have unintended side-effects -- the last place I worked didn't allow using NULLs in the database, because developers might forget to check for null. This ended up forcing us to work around "1/1/1900" dates and integers defaulted to "0" in all the software built against the databases, and introducing a litany of bugs caused by devs working around places where NULL was the appropriate value.
You've answered your own question:
he was encouraging code reuse via copy-and-paste
Reuse the code by creating a view. If the view performs poorly, it will be much easier to track down than if you have the same poorly performing code in several places.
Not a big fan of views (Can't remember the last time I wrote one) but wouldn't ban them entirely either. If your database allows you to put indexes on the views and not just on the table, you can often improve performance a good bit which makes them better. If you are using views, make sure to look into indexing them.
I really only see the need for views for partitioning data and for extremely complex joins that are really critical to the application (thinking of financial reports here where starting from the same dataset for everything might be critical). I do know some reporting tools seem to prefer views over stored procs.
I am a big proponent of never returning more records or fields than you need in a specific instance and the overuse of views tends to make people return more fields (and in way too many cases, too many joins) than they need which wastes system resources.
I also tend to see that people who rely on views (not the developer of the view - the people who only use them) often don't understand the database very well (so they would get the joins wrong if not using the view) and that to me is critical to writing good code against the database. I want people to understand what they are asking the db to do, not rely on some magic black box of a view. That is all personal opinion of course, your mileage may vary.
Like BlaM I personally haven't found them easier to maintain than stored procs.
Edited in Oct 2010 to add:
Since I orginally wrote this, I have had occasion to work with a couple of databases designed by people who were addicted to using views. Even worse they used views that called views that called views (to the point where eventually we hit the limit of the number of tables that can be called). This was a performance nightmare. It took 8 minutes to get a simple count(*) of the records in one view and much longer to get data. If you use views, be very wary of using views that call other views. You will be building a system that will very probably not work under the normal performance load on production. In SQL Server you can only index views that do not call other views, so what ends up happening when you use views in a chain, is that the entire record set has to be built for each view and it is not until you get to the last one that the where clause criteria are applied. You may need to generate millions of records just to see three. You may join to the same table 6 times when you really only need to join to it once, you may return many many more columns than you need in the final results set.
My current database was completely awash with countless small tables of no more than 5 rows each. Well, I could count them but it was cluttered. These tables simply held constant type values (think enum) and could very easily be combined into one table. I then made views that simulated each of the tables I deleted to ensure backward compactability. Worked great.
One thing that hasn't been mentioned thus far is use of views to provide a logical picture of the data to end users for ad hoc reporting or similar.
This has two merits:
To allow the user to single "tables" containing the data they expect rather requiring relatively non technical users to work out potentially complex joins (because the database is normalised)
It provides a means to allow some degree of ah hoc access without exposing the data or the structure to the end users.
Even with non ad-hoc reporting its sometimes signicantly easier to provide a view to the reporting system that contains the relveant data, neatly separating production of data from presentation of same.
Like all power, views have its own dark side. However, you cannot blame views for somebody writing bad performing code. Moreover views can limit the exposure of some columns and provide extra security.
Views are good for ad-hoc queries, the kind that a DBA does behind the scenes when he/she needs quick access to data to see what's going on with the system.
But they can be bad for production code. Part of the reason is that it's sort of unpredictable what indexes you will need with a view, since the where clause can be different, and therefore hard to tune. Also, you are generally returning a lot more data than is actually necesary for the individual queries that are using the view. Each of these queries could be tightened up and tuned individually.
There are specific uses of views in cases of data partitioning that can be extremely useful, so I'm not saying they should avoided altogether. I'm just saying that if a view can be replaced by a few stored procedures, you will be better off without the view.
We use views for all of our simple data exports to csv files. This simplifies the process of writing a package and embedding the sql within the package which becomes cumbersome and hard to debug against.
Using views, we can execute a view and see exactly what was exported, no cruft or unknowns. It greatly helps in troubleshooting problems with improper data exports and hides any complex joins behind the view. Granted, we use a very old legacy system from a TERMS based system that exports to sql, so the joins are a little more complex than usual.
Some time ago I've tried to maintain code that used views built from views built from views... That was a pain in the a**, so I got a little allergic to views :)
I usually prefer working with tables directly, especially for web applications where speed is a main concern. When accessing tables directly you have the chance to tweak your SQL-Queries to achieve the best performance. "Precompiled"/cached working plans might be one advantage of views, but in many cases just-in-time compilation with all given parameters and where clauses in consideration will result in faster processing over all.
However that does not rule out views totally, if used adequately. For example you can use a view with the "users" table joined with the "users_status" table to get an textual explanation for each status - if you need it. However if you don't need the explanation: use the "users" table, not the view. As always: Use your brain!
Views have been helpful to us in their role for use by public web based applications that dip from a production database. Simplified security is the primary advantage we see since the table design in the database may combine sensitive and non-sensitive data within the same table. A stored procedure shares much of this advantage, but the view is read-only, has potential interop advantages, and is a less complex thing for junior people to implement.
This security abstraction advantage also applies when views are used for end-user ad-hoc queries; this would be less of an advantage if we had a proper, flattened, data warehouse representation of our data.
From an application stand point which uses an ORM, it's a lot harder to execute a custom query than doing a select on a discretely mapped type (eg, the view).
For example, if you need just 5 fields of a table that has many (say 30 or 40) an ORM framework will create an entity to represent the table.
That means that even though you only need a few properties of the entity, the select query generated by the ORM framework will bring the entire entity in its full glory. A view on the other hand, although also mapped to an entity with the ORM framework, will only bring the data you need.
Second, since ORM frameworks map entities to tables, relationships between entities are generated (and hydrated) on the client side, meaning that the query has to execute and return to the app before linking of those entities can happen at runtime within the app.
Some frameworks bypass that by returning the data from multiple linked entities in a giant select (with multiple joins), bringing in the columns of all related tables in one call. Internally the framework disassembles the giant result set and structures the logical presentation of the linked entities before returning those entities to the caller app.
Point being is that views are a life saver for apps using ORM. The alternative is to manually make db calls, and manually passing the resulting recordsets into usable entities/models.
While this approach is good and definitely produces a result, it has lots of negative facets. Manual code... is manual; hard to maintain, cumbersome in implementation, and causes devs to worry more about the specifics of the DB provider API vs the logical domain model. Not to mention that it increases time to production (its a lot more labourious) costs for development, maintenance, surface area of bugs, etc.
So for anyone saying views are bad, please consider the other side of things; The stuff the high and mighty DBA's most often have no clue about.
Let's see if I can come up with a lame analogy ...
"I don't need a phillips screwdriver. I carry a flat head and a grinder!"
Dismissing views out of hand will cause pain long term. For one, it's easier to debug and modify a single view definition than it is to ship modified code.
Views can also reduce the size of complex queries (in the same way stored procs can).
This can reduce network bandwith for very busy databases.