How to use/combine native SQL-elements in/with Ruby on Rails? - sql

Long ago I have learned sql and within the last years of application development I realized I only rarely really play with a real sql console or sql commands at all, especially since I mainly work with rails applications for a while now.
But right now I am working on getting a few Microsoft certifications, so for that I did end up relearning sql from scratch. And by that so many things come to mind that - I have to admit - I have forgotten over all the years. Yes, from a developers view, sql is important, but somehow I didn't need much of it... like stored procedures, functions, triggers, etc...
While I already found a nice blog from Nasir about using Views in Rails,
I am still wondering if I can use
functions
stored procedures
triggers
in Rails.
Triggers: Of course I wouldn't need do define triggers within a rails application. I would create them directly on the database management console. I'd just have to remember the things that are 'automatically' done. I would like to use those for logging purposes or pre-calculations of quick-access-tables ...
Functions: They should be easy to use I would think. Is it possible to add them via the 'select'-method of ActiveRecord?
Stored Procedures: How would one use those from Rails, i mean they could be valuable if you have several complex queries with multiple joins and calculation-based dependencies. I wonder a) how to call one and b) how to receive the results
Well, if you have more insight to the inner workings of Rails in relation to sql and could point out if these native sql-elements are available for/from a Rails-application it would be lovely if you could point at some Howto's, Tutorials.
Another thing I am wondering about is the use of foreign keys. Rails doesn't use them explicitly on the sql-side... would it be useful/helpful to manually add them to the database relations? Or would they hinder Rails' data access?
Thanks for any response, I am eager to find out what I can do between Rails and Sql to combine them in a maybe more efficient way.

As you will have noticed, and mention yourself: rails does a good job of hiding much of the sql/implementation details.
Still, I believe it is very important to use your sql and database wisely.
Validations
Validations should be defined on your database as much as possible, not only in rails. You define it in Rails to give nice user-feedback if needed. But ultimately you do not know how data gets into the database: some operator can use sql, maybe other programs interface with the database, or more frequent: two rails processes can insert data nearly simultaneously.
Foreign keys
Should most definitely be defined on your database. For rails is not needed, it will write the queries correctly, but this will guard your database against wrong data. This will safeguard your data integrity. If someones deletes a record and another record is still pointing to that, your database will complain.
Indexes
This is even more easily overlooked: create indexes! On your primary key (automatically), on much searched on fields (like name), on foreign key fields !!
Complicated queries
As much as rails helps you when retrieving items, for some queries it is much more efficient to write the query yourself. While I will avoid it as long as possible, find_by_sql is a powerful tool.
And rails is extremely powerful/helpful in treating the result of a find_by_sql as a normal result.
Stored procedures, functions, ...
Normally you do not need them when using rails. But there are some very valid cases where they are very useful. For instance, I have created a geographic information system, where we used stored procedures to create various spatial objects. In rails you can directly execute sql using the
YourModel.connection.execute(' .. your sql here .. ')
So even execute stored procedures. It will not be for everyone, but there are some very valid reasons to move work to the database. For example if you have to perform an operation over a whole lot of tables or rows, it could be very efficient to call a stored procedure instead of retrieving all the data, changing the rows, and saving them back. It depends on your problem at hand.
Conclusion
I want to make absolutely clear that Rails nicely abstracts the database away, and for everyday use this is just great. You should define foreign keys, indexes and constraints on your database. For the more advanced stuff like functions, stored procedures, complicated queries: Rails does not stop you from doing anything complicated if needed. One should consider your database as a tool as much as Rails is. But remember:
Premature optimization is the root of all evil. --Donald Knuth
So the options are available, but only use them if it is really necessary.

First of all, I don't particularly share your sentiments with regards to the following completely
Yes, from a developers view, sql is important, but somehow I didn't
need much of it... like stored procedures, functions, triggers, etc...
SQL is important if you're developing all the time for extreme "scalability" — and do note I'm using the term here very loosely as Rails does scale if you know how to go about it, but with limits as stories such as Twitter's switch from Rails etc. are quite ubiquitous.
However, Rails isn't always about extreme performance. It is first and foremost about developer 'happiness', increased productivity, and reduced time-to-market. This is done quite well, especially in Rails 3.x+ with the high abstraction introduced and it is this very abstraction that ultimately has resulted in Rails being considered as slow, and by most standards it definitely is.
Do note, that I do not intend to cause any 'language' flame-wars here but this is my personal take on the purpose of Rails; I like to call a duck a duck myself (much like two of my favourite languages: python & ruby).
The simple answer to your questions though can be realised as follows:
Triggers: AR callbacks
Functions: AR Query DSL, i.e. .where('name = ?', :full_name)
Stored Procedure: tied into model business logic, typically based on AR callback.
Another thing I am wondering about is the use of foreign keys. Rails
doesn't use them explicitly on the sql-side... would it be
useful/helpful to manually add them to the database relations?
Au contraire mon ami! All AR relations require that the various foreign keys are included in whatever appropriate migrations one includes. It's a common misconception that 'Rails will walk your dog and clean after it as well'. Remember that AR queries are ultimately expressed on your backend solution, in this case an SQL one for the sake of this topic, in SQL (alternatives are SQlite/PostgreSQL etc...).
However*, rails does not make use of FK constraints because you can actually make an _id field point to a non-existant record; they aren't really foreign keys in the strictest sense.
Note: the * indicates that credit for this statement should go to my friend David Workman (workmad3) and revision to the last paragraph of the above.

Related

How to avoid SQL statements spreading everywhere in your app?

I have a medium-sized app written in Ruby, which makes pretty heavy use of a RDBMS. As our code grows, I found the ugly SQL statements are spreading to all modules and methods in my app and embedded in many application logic. I am not sure if this is bad, however, my gut tells me this is quite ugly...
So generally in any languages, how do you manage your SQL statements? Or do you think it is harmful for maintainibility to let many SQL statements embedded in the application logic? Why or why not?
Thanks.
SQL is a language for accessing databases. Often, it gets confused as being the API into the data store for a larger application. In fact, you should design a real API between the data store and the app.
The means several things.
For accessing data stored in tables, you want to go through views in the database, rather than directly access the tables.
For data modification steps, you want to wrap insert/update/delete in stored procedures. This has secondary benefits, where you can handle constraints and triggers in the stored procedure and better log what is happening.
For security, you want to include database security as part of your security architecture. Giving all users full access may not be the best approach.
Unfortunately, it is easy to write a simple app that uses a database directly, whether in java or ruby or VBA or whatever. This grows into a bigger app, and then the maintenance problems arise.
I would suggest an incremental approach to fixing this. Go through the code and create views where you have nasty select statements. You'll probably find you need many fewer views than selects (the views can be re-used -- a good thing).
Find places where code is being modified, and change these to stored procedures. I always return status from the stored procedure for error checking and put log information into a table called someting like splog or _spcalls.
If you want to limit permissions for different users of your app, then you might be interested in this.
Leaving the raw SQL statements in the code is a problem. Just wait until you want to rename a column and you have to find all the places where this breaks the code.
Yes, this is not optimal - maintenance becomes a nightmare; it's hard to forecast and determine which code must change when underlying DB changes occur. This is why it is good practice to create a data access layer (DAL) to encapsulate CRUD operations from the application logic. There is often an business logic layer (BLL) between the application logic and DAL to enforce business rules/logic.
Google "data access layer" "business logic layer" and even "n-tier architecture" to learn more.
If you are concerned about the SQL statements littered around your application logic, maybe consider implementing them as Stored Procedures?
That way you will only be including the procedure name and any parameters that need to be passed to it in your code.
It has other benefits too, a common one being easier to re-use in multiple files.
There is much debate about speed and security of Stored Procedure and you will never get a definitive answer about that so I won't even open that can of worms.
Here is how you do this with Java: Create a class that encapsulates all access to the database. Add a method to the class for each query you need to run.
The answer for ruby will be similar to this.
It depends on the architecture of your application but a simple solution is to keep each sql in a file, qry.sql. For each Ruby module (or whatever is used in Ruby to aggregate related code) you can keep a folder SQL with these files. So, the collection of SQL folder/files form the data access layer of your application. The Ruby code provides the business layer. If your data model changes (field names, etc), you can do greps to identify the sql files that need changes. Anyway, definitely separate SQL from your logic code.

Where do ORMs fall through?

I often hear people bashing ORMs for being inflexible and a "leaky abstraction", but you really don't hear why they're problematic. When used properly, what exactly are the faults of ORMs? I'm asking this because I'm working on a PHP orm and I'd like for it to solve problems that a lot of other ORMs fail at, such as lazy loading and the lack of subqueries.
Please be specific with your answers. Show some code or describe a database schema where an ORM struggles. Doesn't matter the language or the ORM.
One of the bigger issues I have noticed with all the ORMs I have used is updating only a few fields without retrieving the object first.
For example, say I have a Project object mapped in my database with the following fields: Id, name, description, owning_user. Say, through ajax, I want to just update the description field. In most ORMs the only way for me to update the database table while only having an Id and description values is to either retrieve the project object from the database, set the description and then send the object back to the database (thus requiring two database operations just for one simple update) or to update it via stored procedures (which is the method I am currently using).
Objects and database records really aren't all that similar. They have typed slots that you can store stuff in, but that's about it. Databases have a completely different notion of identity than programming languages. They can't handle composite objects well, so you have to use additional tables and foreign keys instead. Most have no concept of type inheritance. And the natural way to navigate a network of objects (follow some of the pointers in one object, get another object, and dereference again) is much less efficient when mapped to the database world, because you have to make multiple round trips and retrieve lots of data that you didn't care about.
In other words: the abstraction cannot be made very good in the first place; it isn't the ORM tools that are bad, but the metaphor that they implement. Instead of a perfect isomorphism it is is only a superficial similarity, so the task itself isn't a very good abstraction. (It is still way more useful than having to understand databases intimately, though. The scorn for ORM tools come mostly from DBAs looking down on mere programmers.)
ORMs also can write code that is not efficient. Since database performance is critical to most systems, they can cause problems that could have been avoided if a human being wrote the code (but which might not have been any better if the human in question didn't understand database performance tuning). This is especially true when the querying gets complex.
I think my biggest problem with them though is that by abstracting away the details, junior programmers are getting less understanding of how to write queries which they need to be able to to handle the edge cases and the places where the ORM writes really bad code. It's really hard to learn the advanced stuff when you never had to understand the basics. An ORM in the hands of someone who understands joins and group by and advanced querying is a good thing. In the hands of someone who doesn't understand boolean algebra and joins and a bunch of other basic SQL concepts, it is a very bad thing resulting in very poor design of database and queries.
Relational databases are not objects and shouldn't be treated as such. Trying to make an eagle into a silk purse is generally not successful. Far better to learn what the eagle is good at and why and let the eagle fly than to have a bad purse and a dead eagle.
The way I see it is like this. To use an ORM, you have to usually stack several php functions, and then connect to a database and essentially still run a MySQL query or something similar.
Why all of the abstraction in between code and database? Why can't we just use what we already know? Typically a web dev knows their backend language, their db language (some sort of SQL), and some sort of frontend languages, such as html, css, js, etc...
In essence, we're trying to add a layer of abstraction that includes many functions (and we all know php functions can be slower than assigning a variable). Yes, this is a micro calculation, but still, it adds up.
Not only do we now have several functions to go through, but we also have to learn the way the ORM works, so there's some time wasted there. I thought the whole idea of separation of code was to keep your code separate at all levels. If you're in the LAMP world, just create your query (you should know MySQL) and use the already existing php functionality for prepared statements. DONE!
LAMP WAY:
create query (string);
use mysqli prepared statements and retrieve data into array.
ORM WAY:
run a function that gets the entity
which runs a MySQL query
run another function that adds a conditional
run another function that adds another conditional
run another function that joins
run another function that adds conditionals on the join
run another function that prepares
runs another MySQL query
run another function that fetches the data
runs another MySQL Query
Does anyone else have a problem with the ORM stack? Why are we becoming such lazy developers? Or so creative that we're harming our code? If it ain't broke don't fix it. In turn, fix your dev team to understand the basics of web dev.
ORMs are trying to solve a very complex problem. There are edge cases galore and major design tradeoffs with no clear or obvious solutions. When you optimize an ORM design for situation A, you inherently make it awkward for solving situation B.
There are ORMs that handle lazy loading and subqueries in a "good enough" manner, but it's almost impossible to get from "good enough" to "great".
When designing your ORM, you have to have a pretty good handle on all the possible awkward database designs your ORM will be expected to handle. You have to explicitly make tradeoffs around which situations you are willing to handle awkwardly.
I don't look at ORMs as inflexible or any more leaky than your average complex abstraction. That said, certain ORMs are better than others in those respects.
Good luck reinventing the wheel.

When to use an ORM (Sequel, Datamapper, AR, etc.) vs. pure SQL for querying

A colleague of mine is currently designing SQL queries like the one below to produce reports, which are displayed in excel files through an external data query.
At present, only reporting processes on the DB are required (no CRUD operations).
I am trying to convince him that it would be better to use a ruby ORM in order to be able to display the data in a rails/sinatra app.
Despite the obvious advantages in displaying the data, what advantages are there for him in learning to use an ORM like Sequel or Datamapper?
The SQL queries he is writing are clearly quite complex, and being relatively new to SQL, he often complains that it is very time-consuming and confusing.
Is it possible to write extremely complex queries with an ORM? and if so, which is the most suitable(I have heard Sequel is good for legacy dbs)? and what are the advantages of learning ruby and using an ORM versus sticking with plain SQL, in making complex database queries?
I'm the DataMapper maintainer, and I think for complex reporting you should use SQL.
While I do think someday we'll have a DSL that provides the power and conciseness of SQL, everything I've seen so far requires you to write more Ruby code than SQL for complex queries. I would much rather maintain a 5 line SQL query than 10-15 lines of Ruby code to describe the same complex operation.
Please note I say complex.. if you have something simple, use the ORM's build-in finders. However, I do believe there is a line you can cross where SQL becomes simpler. Now, most apps aren't just reporting. You may have alot of CRUD type operations, for which an ORM is perfectly suited and far better than doing those things by hand.
One thing that an ORM will usually provide is some sort of organization to your application logic. You can group code based around each model in the same file. It's usually there that I'll put the complex SQL query, rather than embedding it in the controller, eg:
class User
include DataMapper::Resource
property :id, Serial
property :name, String, :length => 1..100, :required => true
property :age, Integer, :min => 1, :max => 130
def self.some_complex_query
repository.adapter.select <<-SQL
SELECT ...
FROM ...
WHERE ...
... more complex stuff here ...
SQL
end
end
Then I can just generate the report using User.some_complex_query. You could also push the SQL query into a view if you wanted to further cleanup this code.
EDIT: By "view" in the above sentence I meant RDBMS view, rather than view in the MVC context. Just wanted to clear up any potential confusion.
If you are writing your queries by hand you have the chance to optimize them. When I look at that query I see some potential for optimizations (E.ICGROUPNAME LIKE '%san-fransisco%' or E.ICGROUPNAME LIKE '%bordeaux%' wont use an index = Table Scan).
When using an OR Mapper (the native Objects/Tables) for reporting you have no or little control over the resulting SQL Query.
But: You could put that query in an View or Stored Procedure and map that View/Proc with an OR Mapper. You can optimize your queries and you can use all features of your Application Framework.
Unless you're dealing with objects, an ORM is not necessary. It sounds like your friend simply needs to generate reports, in which case pure SQL is just fine so long as he knows what he's doing (e.g. avoiding SQL injection issues).
ORM stands for "Object-Relational Mapping". If you don't have the "O" (objects), then it's probably not a good fit for your app. Where ORMs really shine is in persisting objects to the database and loading them from a database.
ORM stands for Object Relational Mapping - but looking at the query your friend seems to be wanting a pretty specific table of sums and other items... I've not used Ruby's Sequel, but I've used Hibernate, and Python's SQLAlchemy (for Django/Turbogears) and while you can do these sorts of queries, I don't believe that is their strength.
The power of ORM comes from being able to finding Foo->Bar object relationships, say you want all the Bar objects for Foo's field greater then X... That sort of thing. Therefore I would not classify an ORM as a "good" solution, though moving to a real programming language like Ruby and doing the SQL through it instead of Excel... that in itself is a win.
Just my 2 cents.
In a situation like that, I'd probably write them by hand or use a View (if the DB you're using supports views)
ORM's are used when you have Objects (Business Objects). I am therefore assuming that you have an application with which you creating and Managing the Business Objects that are ultimately saved into the database. If you have then you have almost definitely got some representation of the relationships and probably many of the calculations you are going to use in reports. The problem with using SQL to directly access your database for reports is simply maintainability.
You typically put a lot of effort into ensuring that your Business Objects hide any details of their database. You implement business rules and do common calculations in your Business Objects. Build a common language for all members of the team etc etc. You then use an ORM to map to the database and use Habanero or NHibernate or something like that to do this. This is all great. We do this all in the name of Maintainability and is great. You can migrate your application change your design etc etc.
You now go and write SQL to run reports over time you have hundreds of report. Firstly they often duplicate logic you already have in your BusinessObjects (Usually without any tests) and even worse Bham Damb sorry maintainability is now stuffed forget about moving a that field from one table to another forget about splitting that table into two changing that relationship etc you have a number of reports that are going to break unexpectedly.
The problem with quering through your Domain Objects/Business Objects is simply one of performance.
In summary if you are using Domain Driven Design or Business Object concepts try to use these for reports. (You will probably run directly from DB using SQL or stored procs for performance reasons but try limit these use your Business Objects first and then use SQL).
The other option of course is using a separate reporting database (Like some of the BI concepts) The mapping from your transactional DB to your reporting DB is therefore in one place and easily changeable in cases where you want to change your design.
Domain Objects (Business Objects) and ORMs have all the knowledge to allow you to start building high performing queries that run directly on the Database while using the Domain Terminology. Lets hope that these continue to evolve to a point where this is a reality.
Until then if you are using Business Objects in your application try use them for Reporting when performance is an issue resort to SQL.

Convincing a die hard DBA to use an ORM for the majority of CRUD vs Stored Procedures, View, and Functions

I have been working with NHibernate, LINQ to SQL, and Entity Framework for quite some time. And while I see the benefits to using an ORM to keep the development effort moving quickly, the code simple, and the object relational impedance mismatch to a minimum, I still find it very difficult to convince a die hard SQL dba of an ORM's strengths. From my point of view an ORM can be used for at least 90-95% of all of your data access leaving those really hairy things to be done in procedures or functions where appropriate. I am by no means the guy that says we must do everything in the ORM!
Question: What are some of the better arguments for convincing an old school dba that the use of an ORM is not the absolute worst idea ever conceived by a programmer!
If you want to convince him, first you need to understand what his problem is with use of an ORM. Giving you a list of generic benefits is unlikely to help if it does not address the issues he has.
However, my first guess as to his issue would be that it prevents him from doing any optimisation because you're accessing tables directly so he has no layer of abstraction behind which to work, so if a table needs altering or (de)normalizing then he can't do it without breaking your application.
If you're wondering why a DBA would feel like this, and how to respond to it, then it's roughly the same as him coming up to you and saying he wants you to make all the private fields in your classes public, and that you can't change any of them without asking him first. Imagine what it would take for him to convince you that's a good idea, and then use the same argument on him.
Explain to them that creating a stored procedure for every action taken by an application is unmaintainable on several levels.
If the schema changes it's difficult
to track down all the stored
procedures that are affected.
It's impossible ensure that multiple
stored procedures aren't created to
do the same thing, or if slightly
altering an existing stored
procedure is going to have serious
ramifications.
It's difficult to make sure that the
application and database are in
sync after a deploy.
Dynamic SQL has all these issues and more.
I guess, my first question to "Convincing a die hard DBA to use an ORM" would be: Is the DBA also a programmer that also works outside the DB so that he/she would "use an ORM"? If not then why would the DBA give up a major part of their job to someone else and thereby significantly reduce their overall usefulness to the company? They wouldn't.
In any case, the best way to convince any engineer of anything is with empirical data. Setup a prototype with a few parts of the real application ported to ORM for the purpose of your demonstration and actually prove your points.
On another point I think you don't get the object relational impedance dilemma if you're trying to use that as an argument to use an Object-Relation-Mapper. The DBA could quote from that link you posted where where it says "Mapping such private object representation to database tables makes such databases fragile according to OOP philosophy" and that the issue is further pronounced "particularly when objects or class definitions are mapped (ORM) in a straightforward way to database tables or relational schemata" So according to your own link, by promoting ORM you are promoting the problem.
By using sprocs the DBA is free to make changes to the underlying schema, so long as the sproc still returns the same columns with the same types. Thusly with this abstraction that sprocs add, the direct schema mapping issues become nought. This does not mean however that you need to give up your beloved EF since EF can now be used quite happily with sprocs.
Procedures used to be more efficient because of predictable caching mechanisms. However, many DBA's overkill the procedures, introducing lots of branching logic with IF commands, resulting in an scenarios where they become uncacheable.
Next, procedures are only useful if you plan to span data logic across multiple platforms; a website and separate client application, for example. If you're only making a web application, the procedures introduce an unnecessary level of abstraction and more things to juggle. Having to adjust a table, then a procedure, then a data model is a lot of work when adjusting a single model via the ORM would suffice.
Lastly, procedures couple your code to your database very tightly. If you want to migrate to a different database you have to migrate all the procedures, some of which may need to be heavily rewritten. This sort of migration is significantly easier with an ORM since you can yank out the backend and install a new one without the frontend application knowing the difference.

Can you have too many stored procedures?

Is there such a thing as too many stored procedures?
I know there is not a limit to the number you can have but is this any performance or architectural reason not to create hundreds, thousands??
To me the biggest limitation to that hundreds or thousands store procedure is maintainability. Even though that is not a direct performance hit, it should be a consideration. That is an architectural stand point, you have to plan not just for the initial development of the application, but future changes and maintenance.
That being said you should design/create as many as your application requires. Although with Hibernate, NHibernate, .NET LINQ I would try to keep as much store procedures logic in the code, and only put it in the database when speed is a factor.
Yes you can.
If you have more than zero, you have too many
Do you mean besides the fact that you have to maintain all of them when you make a change to the database? That's the big reason for me. It's better to have fewer ones that can handle many scenarios than hundreds that can only handle one.
I would just mention that with all such things maintenance becomes an issue. Who knows what they all are and what purpose they serve? What happens years down the way when they are just artifacts of a legacy system that no one can recall what they are for but are scared to mess with?
I think the main thing is the question of not is it possible, but should it be done. Thats just one thought anyway.
I have just under 200 in a commercial SQL Server 2005 product of mine, which is likely to increase by about another 10-20 in the next few days for some new reports.
Where possible I write 'subroutine' sprocs, so that anytime I find myself putting 3 or 4 identical statements together in more than a couple of sprocs, it's time to turn those few statements into a subroutine, if you see what I mean.
I don't tend to use the sprocs to perform all the business logic as such, I just prefer to have sprocs doing anything that could be seen as 'transactional' - so where as my client code (in Delphi but whatever) might do the odd insert or update itself, as soon as something requires a couple of things to be updated or inserted 'at once', it's time for a sproc.
I have a simple, crude naming convention to assist in the readability (and in maintenance!)
The product's code name is 'Rachel', so we have
RP_whatever - these are general sprocs that update/insert data,
- or return small result sets
RXP_whatever - these are subroutine sprocs that server as 'functions'
- or workers to the RP_ type procs
REP_whatever - these are sprocs that simply act as glorified views almost
- they don't alter data, they return potentially complex result sets
- for reporting purposes, etc
XX_whatever - these are internal test/development/maintenance sprocs
- that the client application (or the local dba) would not normally use
the naming is arbitrary, it's just to help distinguish from the sp_ prefix that SQL Server uses.
I guess if I found I had 400-500 sprocs in the database I might become concerned, but a few hundred isn't a problem at all as long as you have a system for identifying what kind of activity each sproc is responsible for. I'd rather chase down schema changes in a few hundred sprocs (where the SQL Server tools can help you find dependencies etc) than try to chase down schema changes in my high-level programming language.
I suppose the deeper question here is- where else should all of that code go? Views can be used to provide convenient access to data through standard SQL. More powerful application languages and libraries can be used to generate SQL. Stored procedures tend to obscure the nature of the data and introduce an unnecessary layer of abstraction, complexity, and maintenance to a system.
Given the power of object relational mapping tools to generate basic SQL, any system that is overly dependent on stored procedures is going to be less efficient to develop. With ORM, there is simply a lot less code to write.
Hell yes, you can have too many. I worked on a system a couple of year ago that had something like 10,000 procs. It was insane. The people who wrote the system didn't really know how to program but they did know how to right badly structured procs, so they put almost all of the application logic in the procs. Some of the procs ran for thousands of line. Managing the mess was a nightmare.
Too many (and I can't draw you a specific line in the sand) is probably an indicator of poor design. Besides, as others have pointed out, there are better ways to achieve a granular database interface without resorting to massive numbers of procs.
My question is why aren't you using some sort of middleware platform if you're talking about hundreds or thousands of stored procedures?
Call me old fashioned but I thought the database should just hold the data and the program(s) should be the one making the business logic decisions.
in Oracle database you can use Packages to group related procedures together.
a few of those and your namespace would free up.
Too much of any particular thing probably means you're doing it wrong. Too many config files, too many buttns on the screen ...
Can't speak for other RDBMSs but an Oracle application that uses PL/SQL should be logically bundling procedures and functions into packages to prevent cascading invalidations and manage code better.
To me, using lots of stored procedures seems to lead you toward something equivalent to the PHP API. Thousands of global functions which may have nothing to do with eachother all grouped together. The only way to relate between them is to have some naming convention where you prefix each function with a module name, similar to the mysql_ functions in PHP. I think this is very hard to maintain, and very hard to to keep everything consistent. I think stored procedures work well for things that really need to take place on the server. A stored procedure for a simple select query, or even a select query with a join is probably going to far. Only use stored procedures where you actually require advanced logic to be handled on the database server.
As others have said, it really comes down to the management aspect. After a while, it turns into finding the proverbial needle in the haystack. That's even more so when there's poor naming standards in place...
No, not that I know of. However, you should have a good naming convention for them. For example, starting them with usp_ is something that a lot of poeple like to do (fastr than starting with sp_).
Maybe your reporting sprocs should be usp_reporting_, and your bussiness objects should be usp_bussiness_, etc. As long s you can manage them, there shouldn't be a problem with having a lot of them.
If they get too big you might want to split the database up into multiple databases. This might make more logical sense and can help with database size and such.
Yep, you can have toooooo many stored procedures.
Naming conventions can really help with this. For instance, if you have a naming convention where you always have the table name and InsUpd or Get, it's pretty easy to find the right stored procedure when you need it. If you don't have some kind of standard naming convention it would be really easy to come up with two (or more) procedures that do almost exactly the same thing.
It would be easy to find a couple of the following without having a standardized naming convention... usp_GetCustomersOrder, CustomerOrderGet_sp, GetOrdersByCustomer_sp, spOrderDetailFetch, GetCustOrderInfo, etc.
Another thing that starts happening when you have a lot of stored procedures is that you'll have some stored procedures that are never used and some that are just rarely used... If you don't have some way of tracking stored procedure usage you'll either end up with a lot of unused procedures... or worse, getting rid of one you think is never used and finding out afterwards that it's only used once a year or once a quarter. :(
Jeff
Not while you have sane naming conventions, up-to-date documentation and enough unit tests to keep them organized.
My first big project was developed with an Object Relational Mapper that abstracted the database so we did everything object oriented and it was easier to mantain and fix bugs and it was especially easy to make changes since all the data access and business logic was C# code, however, when it came to do complex stuff the system felt slow so we had to work around those things or re-architect the project, especially when the ORM had to do complex joins in the database.
Im now working on a different company that has a motto of "our applications are just frontends of the database" and it has its advantages and disadvantages. We have really fast applications since all of the data operations are done on stored procedures, this is using mainly SQL Server 2005, but, when it comes to make a change, or a fix to the software its harder because you have to change both, the C# code and the SQL stored procedures so its like working twice, contrary to when I used an ORM, you dont have refactoring or strongly typed objects in sql, Management Studio and other tools help a lot but normally you spend more time going this way.
So I would say that deppending on the needs of the project, if you are not going to keep really complex business data than maybe you could even avoid using stored procedures at all and use an ORM that makes developer life's easier. If you are concerned about performance and need to save all of the resources than you should use stored procedures, of course, the quantity deppends uppon the architecture that you design, so for exmaple if you always have stored procedures for CRUD operations you would need one for inserts, one for update, one for deletion, one for selecting and one for listing since these are the most common operations, if you have 100 business objects, multiply these 4 by that and you get 400 stored procedures just to manage the most basic operations on your objects so, yes, they could be too many.
If you have a lot of stored procedures, you'll find you are tying yourself to one database - some of them may not easily be transferred.
Tying yourself to one database isn't good design.
Moreover, if you have business logic on the database and in a business layer then maintenance becomes a problem.
i think as long as you name them uspXXX and not spXXX the lookup by name will be direct, so no downside - though you might wonder about generalizing the procs if you have thousands...
You have to put all that code somewhere.
If you are going to have stored procedures at all (I personally do favour them, but many do not), then you might as well use them as much as you need to. At least all the database logic is in one place. The downside is that there isn't typically much structure to a bucket full of stored procs.
Your call! :)