Set Based Operations and calling Stored Procedures - sql

I am working on a stored procedure that performs some operations on students in the class
In the last step it updates status of some of the students based on some criteria.
It is all pretty straight forward but I have a dilemma here.
Basically there is an existing sp in the system called
pUpdateStudentStatus(studentID, statusID, comments, userID)
This sp is used by the application whenever a status of a single user is to be updated. Apart from updating the status it also logs the changes in the StudentStatusHistory table.
So here is my dilemma,
if I want to use that stored procedure I need loop through the records (either by cursor or by writing loop myself)
if I want to keep all operations set based I need to copy the logic from the pUpdateStudentStatus (which may change in the future)
Are there any other options? Which one would you choose?
I believe an alternative approach with the update trigger is not a way to go as I need some extra details such as userId of the user that changed the status, and comments
I am using SqlServer2005

You don't say whether pUpdateStudentStatus is under your control or created by a third party.
If it's a third party SP, I don't think you have a lot of choice but to use a cursor/loop, since the internals of the SP may change in future releases.
If the SP is under your control, another option would be to create a version of pUpdateStudentStatus with a new name which will operate in a set-based fashion (perhaps by accepting a table variable of arguments), then re-write the existing pUpdateStudentStatus to act as a wrapper calling the new procedure with a single row in the argument table.

Personally unless performance is an issue (and it sounds like this is most likely the sort of job that will run occasionally and maybe even scheduled outside work hours) I would loop over the existing procedure. CPU is invariably cheaper than DBA/Programmer time and maintenance considerations should override efficiency unless there is an impact on the business by not doing so. Either way you should document why you have adopted whichever approach you choose in the code.
Also, if you don't already have a documentation regime I would suggest setting up a simple documentation table within the database with (at least) sp name and descriptive text. Because of the nature of stored procedures/user defined functions keeping overview control over what functionality has been implemented where can be tricky unless some strategy is adopted and I've seen far too many databases where there is a mass of stored procedures/udfs and no simple method of groking what functionality has been implemented where. Version control and full documentation is to be applauded if your group supports it, but if that isn't available then documenting the database inside itself is simple, robust, and a quick win.

If you want to keep the operation set-based then yes, sadly, you will need to copy and paste the sql from pUpdateStudentStatus.
You will need to decide between the performance of a set-based update on the one hand, and code re-use (and ease-of-maintenance) on the other. I know which I would normally choose, but your choice depends on your need for performance versus other considerations.

If you are doing a small number of records, looping is acceptable, but if the batch processes ever get big, you will need set-based code.
An other alternative to what other suggested if you end up needing the set-based logic is to change the proc to allow either set-based or individual inserts. By making the parameters optional (your GUI will need to check to make sure all required parameters are passed for individual inserts) and adding a parameter for a batchnumber to be passed in for set-based operation, you can put the logic for both in one proc.
If batch number is null, do the current actions. If it is passed go to the batch processing part of the proc. For batch processses, the insert proc can becalled by another proc that generates a new batch number, inserts the info you want to insert into a work table including the batch number. then it uses the batchnumebr as the input parameter for the insert proc.
You still will have to write the logic for both cases, but since they are in the same proc, they will be easier to maintain and you will be less likely to forget to update both processes.

Related

Convert an UPDATE into a SELECT

I'm making a simple web interface to allow execution of queries against a database (yeah, I know, I know it's a really bad practice, but it's a private website used only by a few trusted users that currently use directly a DB manager to execute these queries, so the web interface is only to make more automatic the process).
The thing is that, for safety, whenever an UPDATE query is detected I want to first execute a SELECT statement "equivalent" to the update (keeping WHERE clause) to retrieve how many records are going to be affected prior to execute the UPDATE.
The idea is to replace "UPDATE" by "SELECT * FROM" and remove the whole "SET" clause without removing the "WHERE".
I'm trying replacing UPDATE\s*(.*?)\s*SET.*(\s*WHERE\s*.*) by SELECT * FROM \1 \2 and similar but i'm having troubles when there is no "WHERE" clause (uncommon, but possible).
edit: It's pretty hard to explain why I need this to be done like this, but I do, I know about stored procedures, query builders, transactions, etc... but for my case it's not what I need to be able to do.
You should fix your design. There is nothing wrong with users updating data in a database. The question is how they do it. My strong suggestion is to wrap the update statements in stored procedures. Then only allow updates through the stored procedures.
There are several main reasons why I prefer this approach:
I think a well-designed API produces more stable and maintainable code.
The stored procedures control security.
The stored procedures allow better logging of what is happening in the database.
The stored procedures provide control over what users can do.
In your case, though, they offer another advantage. Because all the update code is on the database-side, you know what the update statements look like. So, you can then decide how you want to get the "pre-counts" (which is what I assume you are looking for).
EDIT:
There is also an important flaw in your design (as you describe it). The data might change between the update and the select. If you use stored procedures, there are ways to address this. For instance, you can wrap the operations in a transaction. Or, you use a SELECT to get the rows to be updated, lock those rows (depending on the database), and only do the update on those rows.

Common function / stored procedures and incompatible updates

We have a number of modules within a larger suite that all use a common set of stored procedures and functions, due largely in part to the fact that they all use a common set of data and tables. This approach ensures that all modules receive the same answers for when making the same calls - a very good thing (especially in the financial industry)
However, the downside of this approach is that when we update one module in the suite that requires a change to one of the shared stored procedures or functions it requires that we update almost the entire suite. Which is a bad thing due to time and cost.
What kind of strategies can be employed to mitigate this suite upgrade issue every time we update a single stored procedure, while minimizing the management complexity.
This is similar somewhat to the version issue that Microsoft had with DLL(s) / API(s) where you would see a signature change on an API that would necessitate a xxx2 version, which is not ideal cause then you have basically two versions which both need to be maintained and upgraded with the potential that they get out of synch (i.e. two different answers for the same question).
Any strategies or best practices in this regard would be greatly appreciated.
Thanks in advance.
Whatty
Two strategies are views and stored procedures.
Use views to access the data in the tables. This way, you can change the underlying tables (for one module) and not have to change other modules immediately. Eventually, you can get around to changing that module as well. For instance, you might split one table into two different tables for one module. Other modules will never notice, because they access a view, and you just modify the view to return the original data.
Along these lines, you never want to use select *, because the columns, their names, or their ordering might change.
The second strategy is to wrap all insert, update, and delete into stored procedures. This has a second advantage that you can do checking, logging, and notifications in the procedure. You can try to mimic this with triggers and constraints, but I find the stored procedure approach much more maintainable.

Does an Upsert violate the Single Responsibility Principle?

I like to use Upsert stored procedures that update records if they exist or insert them if they don't. Without them, I would need to first find out if the record exists, and then have two separate stored procedures that I would call based on the result.
I never really thought about the issue before today when I was creating a stored procedure called UpdateOrDeleteRow. As soon as I found myself including "Or" in the name, my SRP spider sense kicked in, and I realized that the upserts are basically the same thing.
Is this a violation of SRP? If so, is it acceptable? If not, what should I do?
I realize that the SRP is an OOP principle, and T-SQL is not an OOP language, but the basis for the principle seems like it should apply here as well.
There is another principle, which I like even more, than SRP - DRY. So, if you call this sequence in one place, you can think about single responsibility. But when you repeating same sequence of actions several times, DRY makes me to remove duplication.
BTW Just come to my mind, that you can avoid OR in procedure/method name. UpdateOrInsert operation has very good name Save. I think it does not breaks SRP.
Personally I don't believe that this principle applies completely in SQL Server. Stored procedures don't always perform just one action (and I think the notion that a stored procedure is equivalent to a class is flawed). I don't think it makes sense to split every single statement in a stored procedure into its own stored procedure. You can get absolutely ridiculous with this.
There is a balance of course, as you can be ridiculous the other way. You don't want a stored procedure with 18 different ways to specify parameters so that it can do 540 different things based on the combinations.
For an UPSERT I would still suggest that a single stored procedure is fine for this. If you want to feel better about it serving a single purpose, change your update/insert into a single MERGE. :-) That said, and in all seriousness, be very careful with MERGE.
I would disagree that the principal should apply in this case as it makes for some redundant code in your codebehind.
First lets examine what your UPSERT does, It checks if data exists then based on that it executes an INSERT or an UPDATE.
In codebehind to do this you have to make 2 calls to your database, depending on how your application is structured this could also mean opening and closing two connections.
So you have 3 methods in codebehind (one to execute each proc) then a method to call each of those methods and do the logic to decide if you need to insert or update.
You also have 3 seperate stored procedures in your database to do each of the actions.
This to me seems like badly structured code since you would be passing the same parameters to your insert / update procedures as you would to your upsert, it therefore makes sense to do this all in one place.
By using an UPSERT you have 1 stored procedure and only need the one connection, with one method to be called from codebehind. I think that this makes for much better, cleaner code.
If you already have procs that do the Update or Delete operations independently, hopefully with logging for auditing purposes, you could have your upsert proc call those individually. That way only those procs are doing the work which should help keep things manageable, even if they're being called from multiple locations.
The single responsibility principle says that an object should only have one reason to change. The only reason that an Upsert stored procedure should change is if the table structure changes. Thus, I think you are okay in creating upsert stored procedures.

Is it a good idea to use a trigger to record a time a row is updated?

I hate triggers. I've been tripped up by them way too many times. But I'd like to know if it's a better to specify the time every time a row is updated or let a trigger take care of it to keep code to a minimum?
Just to clarify, the table the trigger would be on would have a column called something like LastModified
The particular scenario I'm dealing with is I am one of the developers who uses a database with about 400 stored procedures. There are about 20 tables that would have this LastModified column. These tables would be updated by about 10 different stored procedures each.
Triggers can definitely be a huge problem, especially if there are multiple layers of them. It makes debugging, performance tuning, and understanding the data logic almost impossible.
But if you keep your design to a single layer (a trigger for the table), and it is used soley for auditing (i.e. putting the updated time), I don't think that'll be a big problem at all.
Equally, if you are using a stored procedure to be the actor to your tables and views, I think it would make just as much sense (and be a lot easier to remember and look back on) to have your stored procedure put in the current datetime stamp. I think that's a great design.
But if you're using ad hoc queries, and you have a datetime field as not null, it will be a hindrance to remember to call the current datetime. Obviously this won't be a problem with the aforementioned two ideas with the stored procedure or the trigger.
So I think in this situation, is should be personal preference (as long as you don't make a string of triggers that become spaghetti).
You can always use a TimeStamp column (in MySql). If it's MSSQL, just note that the timestamp column is a serial number, not a DateTime data type.
Generally, I avoid triggers like the plague (close to on par with cursors) and so always just update the column manually. To me, it helps reinforce the business logic. But in truth, "better" is a a personal opinion.
Generally, triggers are most useful as a bottleneck if you have a table that needs certain business logic applied for every update, and you have updates coming from different sources.
For example suppose you have php and python apps that both use the database. You could try and remember to update the python code every time you update the php, and vice versa, and hope that they both work the same. Or you could use a trigger, so that no matter what client connects, the same thing will happen to the table.
In most other cases triggers are a bad idea and will cause more pain than they are worth.
Its true that some triggers can often be a source of confusion and frustration when debugging, especially when there are cascading foreign key updates.
But they do have their use.
In this case, you have a choice of updating 400 stored procedures, and updating the date manually. If you miss one of those stored procs, then the field is as good as useless.
Also while triggers 'hide' functionality, the same can be said of explicitly updating it in a stored procedure. What happens when someone writes a new stored procedure, you need to document that the field should be updated.
Personally if its critical the field is up to date, I would use a trigger. Use an INSTEAD OF trigger, so that instead of doing an update and that triggering the trigger, you are simply overriding the insert statement.

Stored procedures or inline queries?

First of all there is a partial question regarding this, but it is not exactly what I'm asking, so, bear with me and go for it.
My question is, after looking at what SubSonic does and the excellent videos from Rob Connery I need to ask: Shall we use a tool like this and do Inline queries or shall we do the queries using a call to the stored procedure?
I don't want to minimize any work from Rob (which I think it's amazing) but I just want your opinion on this cause I need to start a new project and I'm in the middle of the line; shall I use SubSonic (or other like tool, like NHibernate) or I just continue my method that is always call a stored procedure even if it's a simple as
Select this, that from myTable where myStuff = StackOverflow;
It doesn't need to be one or the other. If it's a simple query, use the SubSonic query tool. If it's more complex, use a stored procedure and load up a collection or create a dataset from the results.
See here: What are the pros and cons to keeping SQL in Stored Procs versus Code and here SubSonic and Stored Procedures
See answers here and here. I use sprocs whenever I can, except when red tape means it takes a week to make it into the database.
Stored procedures are gold when you have several applications that depend on the same database. It let's you define and maintain query logic once, rather than several places.
On the other hand, it's pretty easy for stored procedures themselves to become a big jumbled mess in the database, since most systems don't have a good method for organizing them logically. And they can be more difficult to version and track changes.
I wouldn't personally follow rigid rules. Certainly during the development stages, you want to be able to quickly change your queries so I would inline them.
Later on, I would move to stored procedures because they offer the following two advantages. I'm sure there are more but these two win me over.
1/ Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
One of the big bugbears of a DBA is ad-hoc queries (especially by clowns who don't know what a full table scan is). DBAs prefer to have nice consistent queries that they can tune the database to.
2/ Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
Because the logic for "that list" is stored in the database, the DBAs can modify how it's stored and extracted at will without compromising or changing the client code. This is similar to encapsulation that made object-orientd languages 'cleaner' than what came before.
I've done a mix of inline queries and stored procedures. I prefer more of the stored procedure/view approach as it gains a nice spot for you to make a change if needed. When you have inline queries you always have to go and change the code to change an inline query and then re-roll the application. You also might have the inline query in multiple places so you would have to change a lot more code than with one stored procedure.
Then again if you have to add a parameter to a stored procedure, your still changing a lot of code anyways.
Another note is how often the data changes behind the stored procedure, where I work we have third party tables that may break up into normalized tables, or a table becomes obsolete. In that case a stored procedure/view may minimize the exposure you have to that change.
I've also written a entire application without stored procedures. It had three classes and 10 pages, was not worth it at all. I think there comes a point when its overkill, or can be justified, but it also comes down to your personal opinion and preference.
Are you going to only ever access your database from that one application?
If not, then you are probably better off using stored procedures so that you can have a consistent interface to your database.
Is there any significant cost to distributing your application if you need to make a change?
If so, then you are probably better off using stored procedures which can be changed at the server and those changes won't need to be distributed.
Are you at all concerned about the security of your database?
If so, then you probably want to use stored procedures so that you don't have to grant direct access to tables to a user.
If you're writing a small application, without a wide audience, for a system that won't be used or accessed outside of your application, then inline SQL might be ok.
With Subsonic you will use inline, views and stored procedures. Subsonic makes data access easier, but you can't do everthing in a subsonic query. Though the latest version, 2.1 is getting better.
For basic CRUD operations, inline SQL will be straight forward. For more complex data needs, a view will need to be made and then you will do a Subsonic query on the view.
Stored procs are good for harder data computations and data retrieval. Set based retrieval is usually always faster then procedural processing.
Current Subsonic application uses all three options with great results.
I prefer inline sql unless the stored procedure has actual logic (variables, cursors, etc) involved. I have been using LINQ to SQL lately, and taking the generated classes and adding partial classes that have some predefined, common linq queries. I feel this makes for faster development.
Edit: I know I'm going to get downmodded for this. If you ever talk down on foreign keys or stored procedures, you will get downmodded. DBAs need job security I guess...
The advantages of stored procedure (to my mind)
The SQL is in one place
You are able to get query plans.
You can modify the database structure if necessary to improve performance
They are compiled and thus those query plans do not have to get constructed on the fly
If you use permissions - you can be sure of the queries that the application will make.
Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
the best advantage of using stored procs i found is that when we want to change in the logic, in case of inline query we need to go to everyplace and change it and re- roll the every application but in the case of stored proc change is required only at one place.
So use inline queries when you have clear logic; otherwise prefer stored procs.