Using extract method on stored procedure - sql

Extract method is a common refactoring pattern when writing programming languages.
When I try to do some refactorings on my stored procedures, I am wondering if it is also a good practice to use extract method when writing stored procedures (SP)/User-defined functions (UDF) since we can call other SPs/UDFs on a SP/UDF?
Does it affect performance?
Thanks in advance.

Just my opinion (working for several years with databases now):
Stored procedures should be used for database tasks only. For example migrating data (currently I'm working on a process to transform a database structure for example), or some dynamic queries (where a sql statement is built on the fly), or maybe a procedure to build a table (for example a table that holds dates for a specific date range).
Not for anything else! For everything that gets more complicated than above examples consider to code it on application layer.
Also, you maybe heard that it's wise to put as much business logic into the database as possible. That's true for the database design, but it does not mean, that you should code almost everything in it. Databases are not good at that (talking for example about data transformation or something like that). A programming language like PHP or whatever is faster!
So, for everything that I used stored procedures for, I never felt the need to put anything in extra procedures. Apart from for example the restructuring of a database (in my case it's a ETL process (it denormalizes data into a star schema for better performance)), there I wrote a procedure for every table and these procedures are called from a procedure that manages the whole process. But again, it's nothing like a programming language.
Also, when I take this example for extract method pattern http://www.refactoring.com/catalog/extractMethod.html
having something like this in your database will become a debugging nightmare and you will spend way too much time coding. And again, the cases where a stored procedure should be used are not cases where it makes sense to apply the extract method pattern.

Related

How to avoid SQL statements spreading everywhere in your app?

I have a medium-sized app written in Ruby, which makes pretty heavy use of a RDBMS. As our code grows, I found the ugly SQL statements are spreading to all modules and methods in my app and embedded in many application logic. I am not sure if this is bad, however, my gut tells me this is quite ugly...
So generally in any languages, how do you manage your SQL statements? Or do you think it is harmful for maintainibility to let many SQL statements embedded in the application logic? Why or why not?
Thanks.
SQL is a language for accessing databases. Often, it gets confused as being the API into the data store for a larger application. In fact, you should design a real API between the data store and the app.
The means several things.
For accessing data stored in tables, you want to go through views in the database, rather than directly access the tables.
For data modification steps, you want to wrap insert/update/delete in stored procedures. This has secondary benefits, where you can handle constraints and triggers in the stored procedure and better log what is happening.
For security, you want to include database security as part of your security architecture. Giving all users full access may not be the best approach.
Unfortunately, it is easy to write a simple app that uses a database directly, whether in java or ruby or VBA or whatever. This grows into a bigger app, and then the maintenance problems arise.
I would suggest an incremental approach to fixing this. Go through the code and create views where you have nasty select statements. You'll probably find you need many fewer views than selects (the views can be re-used -- a good thing).
Find places where code is being modified, and change these to stored procedures. I always return status from the stored procedure for error checking and put log information into a table called someting like splog or _spcalls.
If you want to limit permissions for different users of your app, then you might be interested in this.
Leaving the raw SQL statements in the code is a problem. Just wait until you want to rename a column and you have to find all the places where this breaks the code.
Yes, this is not optimal - maintenance becomes a nightmare; it's hard to forecast and determine which code must change when underlying DB changes occur. This is why it is good practice to create a data access layer (DAL) to encapsulate CRUD operations from the application logic. There is often an business logic layer (BLL) between the application logic and DAL to enforce business rules/logic.
Google "data access layer" "business logic layer" and even "n-tier architecture" to learn more.
If you are concerned about the SQL statements littered around your application logic, maybe consider implementing them as Stored Procedures?
That way you will only be including the procedure name and any parameters that need to be passed to it in your code.
It has other benefits too, a common one being easier to re-use in multiple files.
There is much debate about speed and security of Stored Procedure and you will never get a definitive answer about that so I won't even open that can of worms.
Here is how you do this with Java: Create a class that encapsulates all access to the database. Add a method to the class for each query you need to run.
The answer for ruby will be similar to this.
It depends on the architecture of your application but a simple solution is to keep each sql in a file, qry.sql. For each Ruby module (or whatever is used in Ruby to aggregate related code) you can keep a folder SQL with these files. So, the collection of SQL folder/files form the data access layer of your application. The Ruby code provides the business layer. If your data model changes (field names, etc), you can do greps to identify the sql files that need changes. Anyway, definitely separate SQL from your logic code.

Stored Procedure structuring

Often I have seen stored procs used for writing business logic in an application. Sometimes these procs will contain 1000+ lines of code. If I write a method/function in application code that contained 1000 lines it would be rightly criticised. Should long stored procs be broken down into separate procs, like methods in a class would be? What isn't this done more as it would certainly make code more usable.
It sounds to me like you're to the point where you need to start thinking about a service layer for your database. This will allow you to move the business logic into a more appropriate language for lots of procedural code, while still enforcing access to the database through your approved api.
First, I agree with Nat's answer: the tools (such as debuggers) for T-SQL debugging provide nowhere near the functionality that one finds for other languages.
Second, there are a number of potential challenges when passing values between stored procedures. Passing simple data types is straight-forward. Passing involved data types becomes more complex. Using temporary tables, XML, delimited strings, record sets, etc. require additional coding, create additional overhead, and have performance implications.
My "rule" is that if the input and output parameters can be handled with the standard methods (i.e. standard data types), then breaking up the stored procedure is warranted. If passing the input and output requires a lot of coding effort, then the stored procedure remains large.
I think "lines of code" is a poor measure of how reusable the code is. I think you need to take a much more qualitative look at these "long" procedures. I've had several long procedures in the past, and whether any of the code can be shortened and modularized really depends - is any of the logic really reused by other applications or is this more of a textbook desire? I am sure there are plenty of modules out there in enterprise applications that are more than 1000 lines of code and don't need to be criticized or broken down into smaller parts...
Does that mean the procedures you've seen that are 1000+ lines of code are justified? Of course not. I just wanted to stress that number of lines of code is not the only factor you should be looking at.
Absolutely, long stored procs should be broken down into short blocks of code that are re-usable and robust. Unfortunately, the language and tools used in database development don't support doing this in a practical manner.
When your SP is so big, you can definitelly say that this procedure 'does many things'. If so, you should do any of these things separatelly.
As I can see in my expirience, if you need to support this functionality, easy to refactor such SP one time than work with 1000 lines of spaghetti-code.

Micro ORM - maintaining your SQL query strings

I will not go into the details why I am exploring the use of Micro ORMs at this stage - except to say that I feel powerless when I use a full blown ORM. There are too many things going on in the background that happens automatically, and not all of them are the best possible choices. I was quite ready to go back to raw database access, but I found out about the three new guys on the block: Dapper, PetaPoco and Massive. So I decided to give the low-level approach a go with a pet project. It is not relevant, but so far, I am using PetaPoco.
In any case, I am having trouble deciding how to go about maintaining the SQL strings that I will use from the higher levels. There are three main solutions that I can think of:
Sprinkle the SQL queries wherever I need them. This is the least infrastructure heavy method. However, it suffers in both maintainability and testability areas.
Limit the query usage to some service classes. This helps maintainability, is still low on infrastructure I need to implement. It may also be possible to build these service classes such that it would be easy to mock for testing purposes.
Prepare some classes to make the system somewhat flexible. I have started on this path. I implemented a Repository interface, and a database dependent Repository class. I have also build some tiny interfaces to capture SQL queries that can be passed to my Repository's GetMany() method. All the queries are implemented as individual classes right now, and I will probably need a little more interface around this to add some level of database independence - and maybe for some flexibility in decorating queries into paged and sorted queries (again, this would also make them a little bit more flexible in handling different databases).
What I am mainly worried about right now is that I have entered the slippery slope of writing all the functions needed for a full blown ORM, but badly. For example, it feels sensible right now that I write or find a library to convert linq calls into SQL statements so that I can massage my queries easily or write extenders that can decorate any query I pass to it, etc. But that is a large task, and is already done by the big guys, so I am resisting the urge to go there. I also want to retain control over what queries I send to the database - by explicitly writing them.
So what is the suggestion? Should I go #2 option, or try to stumble along on option #3? I am certain I cannot show any code written in the first option to anyone without blushing. Is there any other approach you can recommend?
EDIT: After I've asked the question, I realized there is another option, somewhat orthogonal to these three options: stored procedures. There seems to be a few advantages to putting all your queries inside the database as stored procedures. They are kept in a central location, and not spread through the code (though maintenance is an issue - the parameters may get out of sync). The reliance on database dialect is solved automatically: if you move databases, you port all your stored procedures, and you are done. And there is also the security benefits.
With the stored procedure option, the alternatives 1 and 2 seem a little bit more suitable. There seems to be not enough entities to warrant option 3 - but it is still possible to separate the procedure call commands from database accessing code.
I've implemented option 3 without stored procedures, and option 2 with stored procedures, and it seems like the latter is more suitable for me (in case anyone is interested with the outcome of the question).
I would say put the sql where you would have put the equivalent LINQ query, or the sql for DataContext.ExecuteQuery. As for where that is... well, that is up to you and depends on how much separation you want. - Marc Gravell, creator on Dapper
See Marc's opinion on the matter
I think the key point is, you shouldn't really be re-using the SQL. If your logic is re-used then it should be wrapped in a method called that can then be called from multiple places.
I know you've accepted your answer already but I still wanted to show you a nice alternative that may be helpful in your case as well. Now or in the future.
When using stored procedures it's wise to use T4
I tend to use stored procedures on my project even though it's not using PetaPoco, Dapper or Massive (project started before these were here). It uses BLToolkit instead. Anyway. Instead of writing my methods to run stored procedures and write code to provide stored procedure parameters, I've written a T4 template that generates the code for me.
Whenever stored procedures change (some may be added/removed, parameters added/removed/renamed/retyped), my code will break on compilation because method calls will not match their signature any more.
I keep my stored procedures in a file (so they get version controlled). If you work in a multi-developer team it may be sensible to have stored procedures each in its own file. It makes updates much less painful. I've experienced that on some project and it worked ok as long as number of SPs is not huge. You can restructure them into folders based on the entity they're related to.
Anyway. Maintenance is related to stored procedures, code change is just a simple click of a button in Visual Studio that converts all T4s at once. You don't have to search your methods that use those procedures. You'll be reported errors while compiling. One thing less to worry about.
So instead of writing
using (var db = new DbManager())
{
return db
.SetSpCommand(
"Person_SaveWithRelations",
db.Parameter("#Name", name),
db.Parameter("#Email", email),
db.Parameter("#Birth", birth),
db.Parameter("#ExternalID", exId))
.ExecuteObject<Person>();
}
and having a bunch of magic strings I can just simply write:
using (var db = new DataManager())
{
return db
.Person
.SaveWithRelations(name, email, birth, exId)
.ExecuteObject<Person>();
}
This is nicer, cleaner breaks on compile and provides intellisense so it's also faster to while developing.
The good thing is that stored procedures may become very complex and may do many things. In my upper example I check some data, insert person record and some related one as well and in the end return the newly inserted Person record. Inserts and updated should usually return data that was added/changed to reflect actual state.

Stored procedures or inline queries?

First of all there is a partial question regarding this, but it is not exactly what I'm asking, so, bear with me and go for it.
My question is, after looking at what SubSonic does and the excellent videos from Rob Connery I need to ask: Shall we use a tool like this and do Inline queries or shall we do the queries using a call to the stored procedure?
I don't want to minimize any work from Rob (which I think it's amazing) but I just want your opinion on this cause I need to start a new project and I'm in the middle of the line; shall I use SubSonic (or other like tool, like NHibernate) or I just continue my method that is always call a stored procedure even if it's a simple as
Select this, that from myTable where myStuff = StackOverflow;
It doesn't need to be one or the other. If it's a simple query, use the SubSonic query tool. If it's more complex, use a stored procedure and load up a collection or create a dataset from the results.
See here: What are the pros and cons to keeping SQL in Stored Procs versus Code and here SubSonic and Stored Procedures
See answers here and here. I use sprocs whenever I can, except when red tape means it takes a week to make it into the database.
Stored procedures are gold when you have several applications that depend on the same database. It let's you define and maintain query logic once, rather than several places.
On the other hand, it's pretty easy for stored procedures themselves to become a big jumbled mess in the database, since most systems don't have a good method for organizing them logically. And they can be more difficult to version and track changes.
I wouldn't personally follow rigid rules. Certainly during the development stages, you want to be able to quickly change your queries so I would inline them.
Later on, I would move to stored procedures because they offer the following two advantages. I'm sure there are more but these two win me over.
1/ Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
One of the big bugbears of a DBA is ad-hoc queries (especially by clowns who don't know what a full table scan is). DBAs prefer to have nice consistent queries that they can tune the database to.
2/ Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
Because the logic for "that list" is stored in the database, the DBAs can modify how it's stored and extracted at will without compromising or changing the client code. This is similar to encapsulation that made object-orientd languages 'cleaner' than what came before.
I've done a mix of inline queries and stored procedures. I prefer more of the stored procedure/view approach as it gains a nice spot for you to make a change if needed. When you have inline queries you always have to go and change the code to change an inline query and then re-roll the application. You also might have the inline query in multiple places so you would have to change a lot more code than with one stored procedure.
Then again if you have to add a parameter to a stored procedure, your still changing a lot of code anyways.
Another note is how often the data changes behind the stored procedure, where I work we have third party tables that may break up into normalized tables, or a table becomes obsolete. In that case a stored procedure/view may minimize the exposure you have to that change.
I've also written a entire application without stored procedures. It had three classes and 10 pages, was not worth it at all. I think there comes a point when its overkill, or can be justified, but it also comes down to your personal opinion and preference.
Are you going to only ever access your database from that one application?
If not, then you are probably better off using stored procedures so that you can have a consistent interface to your database.
Is there any significant cost to distributing your application if you need to make a change?
If so, then you are probably better off using stored procedures which can be changed at the server and those changes won't need to be distributed.
Are you at all concerned about the security of your database?
If so, then you probably want to use stored procedures so that you don't have to grant direct access to tables to a user.
If you're writing a small application, without a wide audience, for a system that won't be used or accessed outside of your application, then inline SQL might be ok.
With Subsonic you will use inline, views and stored procedures. Subsonic makes data access easier, but you can't do everthing in a subsonic query. Though the latest version, 2.1 is getting better.
For basic CRUD operations, inline SQL will be straight forward. For more complex data needs, a view will need to be made and then you will do a Subsonic query on the view.
Stored procs are good for harder data computations and data retrieval. Set based retrieval is usually always faster then procedural processing.
Current Subsonic application uses all three options with great results.
I prefer inline sql unless the stored procedure has actual logic (variables, cursors, etc) involved. I have been using LINQ to SQL lately, and taking the generated classes and adding partial classes that have some predefined, common linq queries. I feel this makes for faster development.
Edit: I know I'm going to get downmodded for this. If you ever talk down on foreign keys or stored procedures, you will get downmodded. DBAs need job security I guess...
The advantages of stored procedure (to my mind)
The SQL is in one place
You are able to get query plans.
You can modify the database structure if necessary to improve performance
They are compiled and thus those query plans do not have to get constructed on the fly
If you use permissions - you can be sure of the queries that the application will make.
Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
the best advantage of using stored procs i found is that when we want to change in the logic, in case of inline query we need to go to everyplace and change it and re- roll the every application but in the case of stored proc change is required only at one place.
So use inline queries when you have clear logic; otherwise prefer stored procs.

Is it better to write a more targeted stored procedure with fewer parameters?

Say I have a stored procedure that returns data from a SELECT query. I would like to get a slightly different cut on those results depending on what parameters I pass through. I'm wondering whether it is better design to have multiple stored procedures that take one or no parameters to do this (for example, GetXByDate or GetXByUser), or one stored procedure with multiple parameters that does the lot (for example, GetX)?
The advantage of the first option is that it's simpler and maybe faster, but disadvantage is that the essence of the query is duplicated across the stored procedures and needs to be maintained in several places.
The advantage of the second option is that the query is only present once, but the disadvantage is that the query is more complex and harder to troubleshoot.
What do you use in your solutions and why?
The more complex stored procedures are more complex for the SQL server to compile
correctly and execute quickly and efficiently.
Even in the big stored procedure you have to either have to have several copies of the query or add lots of CASEs and IFs in it which reduce performance. So you don't really gain much from lumping everything together.
From my personal experience I also consider large SQL sp code with lots of branches more difficult to maintain that several smaller and straightforward sprocs.
You could consider using views and UDFs to reduce copy-pasting of the query code.
Saying that if you don't care about performance (intranet app, the queries are not that heavy, don't run that often) you might find having a universal sproc quite handy.
I would treat stored procedures much in the same way as I would a method on a class. It ought to do one thing and do it simply. Consider applying the same sorts of refactoring/code smell rules to your stored procedures that you would to your application code.
I prefer GetXByDate, GetXByUser, ... for simple stored procedures, on the basis they will require little maintenance anyway, and in this situation I think it is easier to maintain duplicate code than complicated code.
Of course, if you have more complicated stored procedures, this may not be true. GetAndProcessXByDate may be better reduced to GetXByDate, GetXByUser, ... which call another stored proc ProcessX.
So I guess the definitive answer is: it depends... :)
I second #tvanfosson.
However, I would add that you can do both: have a multi-use sproc (e.g. GetX) which contains the essential logic for a whole class of queries, and wrap it up in a series of smaller sprocs (GetXY, GetXZ) which execute the big one, passing in the appropriate parameters.
This means that you Don't Repeat Yourself, but you can also provide a simple interface to the client apps: an app which only ever calls GetXY doesn't have to know about GetXZ.
We use this approach sometimes.
One advantage of the single stored proc if you're using a generated C# data access layer like LinqToSQL a single class is generated to represent your resultset.
AJs approach gives you the best of both worlds. The pain of having to maintain repeated code across several sprocs cannot be overstated.
Build Sproc and UDF modules for common use, and call them from task-specific sprocs.
Who/what will be calling these stored procedures? I wouldn't write stored procedures for SELECT statements normally, precisely because there are lots of different SELECT statements you might want, including joins to other tables etc.