Does an Upsert violate the Single Responsibility Principle? - sql

I like to use Upsert stored procedures that update records if they exist or insert them if they don't. Without them, I would need to first find out if the record exists, and then have two separate stored procedures that I would call based on the result.
I never really thought about the issue before today when I was creating a stored procedure called UpdateOrDeleteRow. As soon as I found myself including "Or" in the name, my SRP spider sense kicked in, and I realized that the upserts are basically the same thing.
Is this a violation of SRP? If so, is it acceptable? If not, what should I do?
I realize that the SRP is an OOP principle, and T-SQL is not an OOP language, but the basis for the principle seems like it should apply here as well.

There is another principle, which I like even more, than SRP - DRY. So, if you call this sequence in one place, you can think about single responsibility. But when you repeating same sequence of actions several times, DRY makes me to remove duplication.
BTW Just come to my mind, that you can avoid OR in procedure/method name. UpdateOrInsert operation has very good name Save. I think it does not breaks SRP.

Personally I don't believe that this principle applies completely in SQL Server. Stored procedures don't always perform just one action (and I think the notion that a stored procedure is equivalent to a class is flawed). I don't think it makes sense to split every single statement in a stored procedure into its own stored procedure. You can get absolutely ridiculous with this.
There is a balance of course, as you can be ridiculous the other way. You don't want a stored procedure with 18 different ways to specify parameters so that it can do 540 different things based on the combinations.
For an UPSERT I would still suggest that a single stored procedure is fine for this. If you want to feel better about it serving a single purpose, change your update/insert into a single MERGE. :-) That said, and in all seriousness, be very careful with MERGE.

I would disagree that the principal should apply in this case as it makes for some redundant code in your codebehind.
First lets examine what your UPSERT does, It checks if data exists then based on that it executes an INSERT or an UPDATE.
In codebehind to do this you have to make 2 calls to your database, depending on how your application is structured this could also mean opening and closing two connections.
So you have 3 methods in codebehind (one to execute each proc) then a method to call each of those methods and do the logic to decide if you need to insert or update.
You also have 3 seperate stored procedures in your database to do each of the actions.
This to me seems like badly structured code since you would be passing the same parameters to your insert / update procedures as you would to your upsert, it therefore makes sense to do this all in one place.
By using an UPSERT you have 1 stored procedure and only need the one connection, with one method to be called from codebehind. I think that this makes for much better, cleaner code.

If you already have procs that do the Update or Delete operations independently, hopefully with logging for auditing purposes, you could have your upsert proc call those individually. That way only those procs are doing the work which should help keep things manageable, even if they're being called from multiple locations.

The single responsibility principle says that an object should only have one reason to change. The only reason that an Upsert stored procedure should change is if the table structure changes. Thus, I think you are okay in creating upsert stored procedures.

Related

Convert an UPDATE into a SELECT

I'm making a simple web interface to allow execution of queries against a database (yeah, I know, I know it's a really bad practice, but it's a private website used only by a few trusted users that currently use directly a DB manager to execute these queries, so the web interface is only to make more automatic the process).
The thing is that, for safety, whenever an UPDATE query is detected I want to first execute a SELECT statement "equivalent" to the update (keeping WHERE clause) to retrieve how many records are going to be affected prior to execute the UPDATE.
The idea is to replace "UPDATE" by "SELECT * FROM" and remove the whole "SET" clause without removing the "WHERE".
I'm trying replacing UPDATE\s*(.*?)\s*SET.*(\s*WHERE\s*.*) by SELECT * FROM \1 \2 and similar but i'm having troubles when there is no "WHERE" clause (uncommon, but possible).
edit: It's pretty hard to explain why I need this to be done like this, but I do, I know about stored procedures, query builders, transactions, etc... but for my case it's not what I need to be able to do.
You should fix your design. There is nothing wrong with users updating data in a database. The question is how they do it. My strong suggestion is to wrap the update statements in stored procedures. Then only allow updates through the stored procedures.
There are several main reasons why I prefer this approach:
I think a well-designed API produces more stable and maintainable code.
The stored procedures control security.
The stored procedures allow better logging of what is happening in the database.
The stored procedures provide control over what users can do.
In your case, though, they offer another advantage. Because all the update code is on the database-side, you know what the update statements look like. So, you can then decide how you want to get the "pre-counts" (which is what I assume you are looking for).
EDIT:
There is also an important flaw in your design (as you describe it). The data might change between the update and the select. If you use stored procedures, there are ways to address this. For instance, you can wrap the operations in a transaction. Or, you use a SELECT to get the rows to be updated, lock those rows (depending on the database), and only do the update on those rows.

Is it a good idea to use a trigger to record a time a row is updated?

I hate triggers. I've been tripped up by them way too many times. But I'd like to know if it's a better to specify the time every time a row is updated or let a trigger take care of it to keep code to a minimum?
Just to clarify, the table the trigger would be on would have a column called something like LastModified
The particular scenario I'm dealing with is I am one of the developers who uses a database with about 400 stored procedures. There are about 20 tables that would have this LastModified column. These tables would be updated by about 10 different stored procedures each.
Triggers can definitely be a huge problem, especially if there are multiple layers of them. It makes debugging, performance tuning, and understanding the data logic almost impossible.
But if you keep your design to a single layer (a trigger for the table), and it is used soley for auditing (i.e. putting the updated time), I don't think that'll be a big problem at all.
Equally, if you are using a stored procedure to be the actor to your tables and views, I think it would make just as much sense (and be a lot easier to remember and look back on) to have your stored procedure put in the current datetime stamp. I think that's a great design.
But if you're using ad hoc queries, and you have a datetime field as not null, it will be a hindrance to remember to call the current datetime. Obviously this won't be a problem with the aforementioned two ideas with the stored procedure or the trigger.
So I think in this situation, is should be personal preference (as long as you don't make a string of triggers that become spaghetti).
You can always use a TimeStamp column (in MySql). If it's MSSQL, just note that the timestamp column is a serial number, not a DateTime data type.
Generally, I avoid triggers like the plague (close to on par with cursors) and so always just update the column manually. To me, it helps reinforce the business logic. But in truth, "better" is a a personal opinion.
Generally, triggers are most useful as a bottleneck if you have a table that needs certain business logic applied for every update, and you have updates coming from different sources.
For example suppose you have php and python apps that both use the database. You could try and remember to update the python code every time you update the php, and vice versa, and hope that they both work the same. Or you could use a trigger, so that no matter what client connects, the same thing will happen to the table.
In most other cases triggers are a bad idea and will cause more pain than they are worth.
Its true that some triggers can often be a source of confusion and frustration when debugging, especially when there are cascading foreign key updates.
But they do have their use.
In this case, you have a choice of updating 400 stored procedures, and updating the date manually. If you miss one of those stored procs, then the field is as good as useless.
Also while triggers 'hide' functionality, the same can be said of explicitly updating it in a stored procedure. What happens when someone writes a new stored procedure, you need to document that the field should be updated.
Personally if its critical the field is up to date, I would use a trigger. Use an INSTEAD OF trigger, so that instead of doing an update and that triggering the trigger, you are simply overriding the insert statement.

Set Based Operations and calling Stored Procedures

I am working on a stored procedure that performs some operations on students in the class
In the last step it updates status of some of the students based on some criteria.
It is all pretty straight forward but I have a dilemma here.
Basically there is an existing sp in the system called
pUpdateStudentStatus(studentID, statusID, comments, userID)
This sp is used by the application whenever a status of a single user is to be updated. Apart from updating the status it also logs the changes in the StudentStatusHistory table.
So here is my dilemma,
if I want to use that stored procedure I need loop through the records (either by cursor or by writing loop myself)
if I want to keep all operations set based I need to copy the logic from the pUpdateStudentStatus (which may change in the future)
Are there any other options? Which one would you choose?
I believe an alternative approach with the update trigger is not a way to go as I need some extra details such as userId of the user that changed the status, and comments
I am using SqlServer2005
You don't say whether pUpdateStudentStatus is under your control or created by a third party.
If it's a third party SP, I don't think you have a lot of choice but to use a cursor/loop, since the internals of the SP may change in future releases.
If the SP is under your control, another option would be to create a version of pUpdateStudentStatus with a new name which will operate in a set-based fashion (perhaps by accepting a table variable of arguments), then re-write the existing pUpdateStudentStatus to act as a wrapper calling the new procedure with a single row in the argument table.
Personally unless performance is an issue (and it sounds like this is most likely the sort of job that will run occasionally and maybe even scheduled outside work hours) I would loop over the existing procedure. CPU is invariably cheaper than DBA/Programmer time and maintenance considerations should override efficiency unless there is an impact on the business by not doing so. Either way you should document why you have adopted whichever approach you choose in the code.
Also, if you don't already have a documentation regime I would suggest setting up a simple documentation table within the database with (at least) sp name and descriptive text. Because of the nature of stored procedures/user defined functions keeping overview control over what functionality has been implemented where can be tricky unless some strategy is adopted and I've seen far too many databases where there is a mass of stored procedures/udfs and no simple method of groking what functionality has been implemented where. Version control and full documentation is to be applauded if your group supports it, but if that isn't available then documenting the database inside itself is simple, robust, and a quick win.
If you want to keep the operation set-based then yes, sadly, you will need to copy and paste the sql from pUpdateStudentStatus.
You will need to decide between the performance of a set-based update on the one hand, and code re-use (and ease-of-maintenance) on the other. I know which I would normally choose, but your choice depends on your need for performance versus other considerations.
If you are doing a small number of records, looping is acceptable, but if the batch processes ever get big, you will need set-based code.
An other alternative to what other suggested if you end up needing the set-based logic is to change the proc to allow either set-based or individual inserts. By making the parameters optional (your GUI will need to check to make sure all required parameters are passed for individual inserts) and adding a parameter for a batchnumber to be passed in for set-based operation, you can put the logic for both in one proc.
If batch number is null, do the current actions. If it is passed go to the batch processing part of the proc. For batch processses, the insert proc can becalled by another proc that generates a new batch number, inserts the info you want to insert into a work table including the batch number. then it uses the batchnumebr as the input parameter for the insert proc.
You still will have to write the logic for both cases, but since they are in the same proc, they will be easier to maintain and you will be less likely to forget to update both processes.

Stored procedures or inline queries?

First of all there is a partial question regarding this, but it is not exactly what I'm asking, so, bear with me and go for it.
My question is, after looking at what SubSonic does and the excellent videos from Rob Connery I need to ask: Shall we use a tool like this and do Inline queries or shall we do the queries using a call to the stored procedure?
I don't want to minimize any work from Rob (which I think it's amazing) but I just want your opinion on this cause I need to start a new project and I'm in the middle of the line; shall I use SubSonic (or other like tool, like NHibernate) or I just continue my method that is always call a stored procedure even if it's a simple as
Select this, that from myTable where myStuff = StackOverflow;
It doesn't need to be one or the other. If it's a simple query, use the SubSonic query tool. If it's more complex, use a stored procedure and load up a collection or create a dataset from the results.
See here: What are the pros and cons to keeping SQL in Stored Procs versus Code and here SubSonic and Stored Procedures
See answers here and here. I use sprocs whenever I can, except when red tape means it takes a week to make it into the database.
Stored procedures are gold when you have several applications that depend on the same database. It let's you define and maintain query logic once, rather than several places.
On the other hand, it's pretty easy for stored procedures themselves to become a big jumbled mess in the database, since most systems don't have a good method for organizing them logically. And they can be more difficult to version and track changes.
I wouldn't personally follow rigid rules. Certainly during the development stages, you want to be able to quickly change your queries so I would inline them.
Later on, I would move to stored procedures because they offer the following two advantages. I'm sure there are more but these two win me over.
1/ Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
One of the big bugbears of a DBA is ad-hoc queries (especially by clowns who don't know what a full table scan is). DBAs prefer to have nice consistent queries that they can tune the database to.
2/ Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
Because the logic for "that list" is stored in the database, the DBAs can modify how it's stored and extracted at will without compromising or changing the client code. This is similar to encapsulation that made object-orientd languages 'cleaner' than what came before.
I've done a mix of inline queries and stored procedures. I prefer more of the stored procedure/view approach as it gains a nice spot for you to make a change if needed. When you have inline queries you always have to go and change the code to change an inline query and then re-roll the application. You also might have the inline query in multiple places so you would have to change a lot more code than with one stored procedure.
Then again if you have to add a parameter to a stored procedure, your still changing a lot of code anyways.
Another note is how often the data changes behind the stored procedure, where I work we have third party tables that may break up into normalized tables, or a table becomes obsolete. In that case a stored procedure/view may minimize the exposure you have to that change.
I've also written a entire application without stored procedures. It had three classes and 10 pages, was not worth it at all. I think there comes a point when its overkill, or can be justified, but it also comes down to your personal opinion and preference.
Are you going to only ever access your database from that one application?
If not, then you are probably better off using stored procedures so that you can have a consistent interface to your database.
Is there any significant cost to distributing your application if you need to make a change?
If so, then you are probably better off using stored procedures which can be changed at the server and those changes won't need to be distributed.
Are you at all concerned about the security of your database?
If so, then you probably want to use stored procedures so that you don't have to grant direct access to tables to a user.
If you're writing a small application, without a wide audience, for a system that won't be used or accessed outside of your application, then inline SQL might be ok.
With Subsonic you will use inline, views and stored procedures. Subsonic makes data access easier, but you can't do everthing in a subsonic query. Though the latest version, 2.1 is getting better.
For basic CRUD operations, inline SQL will be straight forward. For more complex data needs, a view will need to be made and then you will do a Subsonic query on the view.
Stored procs are good for harder data computations and data retrieval. Set based retrieval is usually always faster then procedural processing.
Current Subsonic application uses all three options with great results.
I prefer inline sql unless the stored procedure has actual logic (variables, cursors, etc) involved. I have been using LINQ to SQL lately, and taking the generated classes and adding partial classes that have some predefined, common linq queries. I feel this makes for faster development.
Edit: I know I'm going to get downmodded for this. If you ever talk down on foreign keys or stored procedures, you will get downmodded. DBAs need job security I guess...
The advantages of stored procedure (to my mind)
The SQL is in one place
You are able to get query plans.
You can modify the database structure if necessary to improve performance
They are compiled and thus those query plans do not have to get constructed on the fly
If you use permissions - you can be sure of the queries that the application will make.
Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
the best advantage of using stored procs i found is that when we want to change in the logic, in case of inline query we need to go to everyplace and change it and re- roll the every application but in the case of stored proc change is required only at one place.
So use inline queries when you have clear logic; otherwise prefer stored procs.

Is it better to write a more targeted stored procedure with fewer parameters?

Say I have a stored procedure that returns data from a SELECT query. I would like to get a slightly different cut on those results depending on what parameters I pass through. I'm wondering whether it is better design to have multiple stored procedures that take one or no parameters to do this (for example, GetXByDate or GetXByUser), or one stored procedure with multiple parameters that does the lot (for example, GetX)?
The advantage of the first option is that it's simpler and maybe faster, but disadvantage is that the essence of the query is duplicated across the stored procedures and needs to be maintained in several places.
The advantage of the second option is that the query is only present once, but the disadvantage is that the query is more complex and harder to troubleshoot.
What do you use in your solutions and why?
The more complex stored procedures are more complex for the SQL server to compile
correctly and execute quickly and efficiently.
Even in the big stored procedure you have to either have to have several copies of the query or add lots of CASEs and IFs in it which reduce performance. So you don't really gain much from lumping everything together.
From my personal experience I also consider large SQL sp code with lots of branches more difficult to maintain that several smaller and straightforward sprocs.
You could consider using views and UDFs to reduce copy-pasting of the query code.
Saying that if you don't care about performance (intranet app, the queries are not that heavy, don't run that often) you might find having a universal sproc quite handy.
I would treat stored procedures much in the same way as I would a method on a class. It ought to do one thing and do it simply. Consider applying the same sorts of refactoring/code smell rules to your stored procedures that you would to your application code.
I prefer GetXByDate, GetXByUser, ... for simple stored procedures, on the basis they will require little maintenance anyway, and in this situation I think it is easier to maintain duplicate code than complicated code.
Of course, if you have more complicated stored procedures, this may not be true. GetAndProcessXByDate may be better reduced to GetXByDate, GetXByUser, ... which call another stored proc ProcessX.
So I guess the definitive answer is: it depends... :)
I second #tvanfosson.
However, I would add that you can do both: have a multi-use sproc (e.g. GetX) which contains the essential logic for a whole class of queries, and wrap it up in a series of smaller sprocs (GetXY, GetXZ) which execute the big one, passing in the appropriate parameters.
This means that you Don't Repeat Yourself, but you can also provide a simple interface to the client apps: an app which only ever calls GetXY doesn't have to know about GetXZ.
We use this approach sometimes.
One advantage of the single stored proc if you're using a generated C# data access layer like LinqToSQL a single class is generated to represent your resultset.
AJs approach gives you the best of both worlds. The pain of having to maintain repeated code across several sprocs cannot be overstated.
Build Sproc and UDF modules for common use, and call them from task-specific sprocs.
Who/what will be calling these stored procedures? I wouldn't write stored procedures for SELECT statements normally, precisely because there are lots of different SELECT statements you might want, including joins to other tables etc.