Convert an UPDATE into a SELECT - sql

I'm making a simple web interface to allow execution of queries against a database (yeah, I know, I know it's a really bad practice, but it's a private website used only by a few trusted users that currently use directly a DB manager to execute these queries, so the web interface is only to make more automatic the process).
The thing is that, for safety, whenever an UPDATE query is detected I want to first execute a SELECT statement "equivalent" to the update (keeping WHERE clause) to retrieve how many records are going to be affected prior to execute the UPDATE.
The idea is to replace "UPDATE" by "SELECT * FROM" and remove the whole "SET" clause without removing the "WHERE".
I'm trying replacing UPDATE\s*(.*?)\s*SET.*(\s*WHERE\s*.*) by SELECT * FROM \1 \2 and similar but i'm having troubles when there is no "WHERE" clause (uncommon, but possible).
edit: It's pretty hard to explain why I need this to be done like this, but I do, I know about stored procedures, query builders, transactions, etc... but for my case it's not what I need to be able to do.

You should fix your design. There is nothing wrong with users updating data in a database. The question is how they do it. My strong suggestion is to wrap the update statements in stored procedures. Then only allow updates through the stored procedures.
There are several main reasons why I prefer this approach:
I think a well-designed API produces more stable and maintainable code.
The stored procedures control security.
The stored procedures allow better logging of what is happening in the database.
The stored procedures provide control over what users can do.
In your case, though, they offer another advantage. Because all the update code is on the database-side, you know what the update statements look like. So, you can then decide how you want to get the "pre-counts" (which is what I assume you are looking for).
EDIT:
There is also an important flaw in your design (as you describe it). The data might change between the update and the select. If you use stored procedures, there are ways to address this. For instance, you can wrap the operations in a transaction. Or, you use a SELECT to get the rows to be updated, lock those rows (depending on the database), and only do the update on those rows.

Related

Does an Upsert violate the Single Responsibility Principle?

I like to use Upsert stored procedures that update records if they exist or insert them if they don't. Without them, I would need to first find out if the record exists, and then have two separate stored procedures that I would call based on the result.
I never really thought about the issue before today when I was creating a stored procedure called UpdateOrDeleteRow. As soon as I found myself including "Or" in the name, my SRP spider sense kicked in, and I realized that the upserts are basically the same thing.
Is this a violation of SRP? If so, is it acceptable? If not, what should I do?
I realize that the SRP is an OOP principle, and T-SQL is not an OOP language, but the basis for the principle seems like it should apply here as well.
There is another principle, which I like even more, than SRP - DRY. So, if you call this sequence in one place, you can think about single responsibility. But when you repeating same sequence of actions several times, DRY makes me to remove duplication.
BTW Just come to my mind, that you can avoid OR in procedure/method name. UpdateOrInsert operation has very good name Save. I think it does not breaks SRP.
Personally I don't believe that this principle applies completely in SQL Server. Stored procedures don't always perform just one action (and I think the notion that a stored procedure is equivalent to a class is flawed). I don't think it makes sense to split every single statement in a stored procedure into its own stored procedure. You can get absolutely ridiculous with this.
There is a balance of course, as you can be ridiculous the other way. You don't want a stored procedure with 18 different ways to specify parameters so that it can do 540 different things based on the combinations.
For an UPSERT I would still suggest that a single stored procedure is fine for this. If you want to feel better about it serving a single purpose, change your update/insert into a single MERGE. :-) That said, and in all seriousness, be very careful with MERGE.
I would disagree that the principal should apply in this case as it makes for some redundant code in your codebehind.
First lets examine what your UPSERT does, It checks if data exists then based on that it executes an INSERT or an UPDATE.
In codebehind to do this you have to make 2 calls to your database, depending on how your application is structured this could also mean opening and closing two connections.
So you have 3 methods in codebehind (one to execute each proc) then a method to call each of those methods and do the logic to decide if you need to insert or update.
You also have 3 seperate stored procedures in your database to do each of the actions.
This to me seems like badly structured code since you would be passing the same parameters to your insert / update procedures as you would to your upsert, it therefore makes sense to do this all in one place.
By using an UPSERT you have 1 stored procedure and only need the one connection, with one method to be called from codebehind. I think that this makes for much better, cleaner code.
If you already have procs that do the Update or Delete operations independently, hopefully with logging for auditing purposes, you could have your upsert proc call those individually. That way only those procs are doing the work which should help keep things manageable, even if they're being called from multiple locations.
The single responsibility principle says that an object should only have one reason to change. The only reason that an Upsert stored procedure should change is if the table structure changes. Thus, I think you are okay in creating upsert stored procedures.

Database Security Question

Well, It seems like such a simple solution to the many problems that can arise from insecure services and applications. But I'm not sure if it's possible, or maybe nobody's thought of this idea yet...
Instead of leaving it up to programmers/developers to ensure that their applications use stored procedures/parameterised queries/escape strings etc to help prevent sql injection/other attacks - why don't the people who make the databases just build these security features into the databases so that when an update or insert query is performed on the database, the database secures/sanitizes the string before it is inserted into the database?
The database would not necessarily know the context of what is going on. What is malicious for one application is not malicious for another. Sometimes the intent IS to
drop table users--
It is much better to let the database do what it does best, arranging data. And let the developers worry about the security implementations.
The problem is that the database cannot readily tell whether the command it is requested to execute is legitimate or not - it is syntactically valid and there could be a valid reason for the user to request that it be executed.
There are heuristics that the DBMS could apply. For example, if a single request combined both a SELECT operation and a DELETE operation, it might be possible to infer that this is more likely to be illegitimate than legitimate - and the DBMS could reject that combined operation. But it is hard to deal with a query where the WHERE condition has been weakened to the point that it shows more data than it was supposed to. A UNION query can deliberately select from multiple tables. It is not sufficient to show that there is a weak condition and a strong condition OR'd together - that could be legitimate.
Overall, then, the problem is that the DBMS is supposed to be able to execute a vast range of queries - so it is essentially impossible to be sure that any query it is given to execute is, or is not, legitimate.
The proper way to access the database is with stored procedures. If you were using SQL Server and C#/VB.NET you could use LINQ to SQL, which allows you to build the query in the language witch then gets turned into a parameterized SP. Good stuff.

Stored Procedures Vs. Views

I have used both but what I am not clear is when I should prefer one over the other. I mean I know stored procedure can take in parameters...but really we can still perform the same thing using Views too right ?
So considering performance and other aspects when and why should I prefer one over the other ?
Well, I'd use stored proc for encapsulation of code and control permissions better.
A view is not really encapsulation: it's a macro that expands. If you start joining views pretty soon you'll have some horrendous queries. Yes they can be JOINed but they shouldn't..
Saying that, views are a tool that have their place (indexed views for example) like stored procs.
The advantage of views is that they can be treated just like tables. You can use WHERE to get filtered data from them, JOIN into them, et cetera. You can even INSERT data into them if they're simple enough. Views also allow you to index their results, unlike stored procedures.
A View is just like a single saved query statement, it cannot contain complex logic or multiple statements (beyond the use of union etc). For anything complex or customizable via parameters you would choose stored procedures which allow much greater flexibility.
It's common to use a combination of Views and Stored Procedures in a database architecture, and perhaps for very different reasons. Sometimes it's to achieve backward compatibility in sprocs when schema is re-engineered, sometimes to make the data more manipulatable compared with the way it's stored natively in tables (de-normalized views).
Heavy use of Views can degrade performance as it's more difficult for SQL Server to optimize these queries. However it is possible to use indexed-views which can actually enhance performance when working with joins in the same way as indexed-tables. There are much tighter restrictions on the allowed syntax when implementing indexed-views and a lot of subtleties in actually getting them working depending on the edition of SQL Server.
Think of Views as being more like tables than stored procedures.
The main advantage of stored procedures is that they allow you to incorporate logic (scripting). This logic may be as simple as an IF/ELSE or more complex such as DO WHILE loops, SWITCH/CASE.
I correlate the use of stored procedures to the need for sending/receiving transactions to and from the database. That is, whenever I need to send data to my database, I use a stored procedure. The same is true when I want to update data or query the database for information to be used in my application.
Database views are great to use when you want to provide a subset of fields from a given table, allow your MS Access users to view the data without risk of modifying it and to ensure your reports are going to generate the anticpated results.
Views are useful if there is a certain combination of tables, or a subset of data you consistently want to query, for example, an user joined with its permissions. Views should in fact be treated as tables.
Stored procedures are pieces of sql code that are 'compiled', as it where, to run more optimally than a random other query. The execution plan of sql code in a stored procedure is already built, so execution runs slightly smoother than that of an ordinary sql statement.
Two rationales.
Use stored procedure instead of view if you don't want insertion to be possible. Inserting in a view may not give what it seems to do. It will insert in a table, a row which may not match the query from the view, a row which will then not appear in the view; inserted somewhere, but not where the statement make it seems.
Use a view if you can't use the result of a stored procedure from another stored procedure (I was never able to make the latter works, at least with MySQL).

Set Based Operations and calling Stored Procedures

I am working on a stored procedure that performs some operations on students in the class
In the last step it updates status of some of the students based on some criteria.
It is all pretty straight forward but I have a dilemma here.
Basically there is an existing sp in the system called
pUpdateStudentStatus(studentID, statusID, comments, userID)
This sp is used by the application whenever a status of a single user is to be updated. Apart from updating the status it also logs the changes in the StudentStatusHistory table.
So here is my dilemma,
if I want to use that stored procedure I need loop through the records (either by cursor or by writing loop myself)
if I want to keep all operations set based I need to copy the logic from the pUpdateStudentStatus (which may change in the future)
Are there any other options? Which one would you choose?
I believe an alternative approach with the update trigger is not a way to go as I need some extra details such as userId of the user that changed the status, and comments
I am using SqlServer2005
You don't say whether pUpdateStudentStatus is under your control or created by a third party.
If it's a third party SP, I don't think you have a lot of choice but to use a cursor/loop, since the internals of the SP may change in future releases.
If the SP is under your control, another option would be to create a version of pUpdateStudentStatus with a new name which will operate in a set-based fashion (perhaps by accepting a table variable of arguments), then re-write the existing pUpdateStudentStatus to act as a wrapper calling the new procedure with a single row in the argument table.
Personally unless performance is an issue (and it sounds like this is most likely the sort of job that will run occasionally and maybe even scheduled outside work hours) I would loop over the existing procedure. CPU is invariably cheaper than DBA/Programmer time and maintenance considerations should override efficiency unless there is an impact on the business by not doing so. Either way you should document why you have adopted whichever approach you choose in the code.
Also, if you don't already have a documentation regime I would suggest setting up a simple documentation table within the database with (at least) sp name and descriptive text. Because of the nature of stored procedures/user defined functions keeping overview control over what functionality has been implemented where can be tricky unless some strategy is adopted and I've seen far too many databases where there is a mass of stored procedures/udfs and no simple method of groking what functionality has been implemented where. Version control and full documentation is to be applauded if your group supports it, but if that isn't available then documenting the database inside itself is simple, robust, and a quick win.
If you want to keep the operation set-based then yes, sadly, you will need to copy and paste the sql from pUpdateStudentStatus.
You will need to decide between the performance of a set-based update on the one hand, and code re-use (and ease-of-maintenance) on the other. I know which I would normally choose, but your choice depends on your need for performance versus other considerations.
If you are doing a small number of records, looping is acceptable, but if the batch processes ever get big, you will need set-based code.
An other alternative to what other suggested if you end up needing the set-based logic is to change the proc to allow either set-based or individual inserts. By making the parameters optional (your GUI will need to check to make sure all required parameters are passed for individual inserts) and adding a parameter for a batchnumber to be passed in for set-based operation, you can put the logic for both in one proc.
If batch number is null, do the current actions. If it is passed go to the batch processing part of the proc. For batch processses, the insert proc can becalled by another proc that generates a new batch number, inserts the info you want to insert into a work table including the batch number. then it uses the batchnumebr as the input parameter for the insert proc.
You still will have to write the logic for both cases, but since they are in the same proc, they will be easier to maintain and you will be less likely to forget to update both processes.

Stored procedures or inline queries?

First of all there is a partial question regarding this, but it is not exactly what I'm asking, so, bear with me and go for it.
My question is, after looking at what SubSonic does and the excellent videos from Rob Connery I need to ask: Shall we use a tool like this and do Inline queries or shall we do the queries using a call to the stored procedure?
I don't want to minimize any work from Rob (which I think it's amazing) but I just want your opinion on this cause I need to start a new project and I'm in the middle of the line; shall I use SubSonic (or other like tool, like NHibernate) or I just continue my method that is always call a stored procedure even if it's a simple as
Select this, that from myTable where myStuff = StackOverflow;
It doesn't need to be one or the other. If it's a simple query, use the SubSonic query tool. If it's more complex, use a stored procedure and load up a collection or create a dataset from the results.
See here: What are the pros and cons to keeping SQL in Stored Procs versus Code and here SubSonic and Stored Procedures
See answers here and here. I use sprocs whenever I can, except when red tape means it takes a week to make it into the database.
Stored procedures are gold when you have several applications that depend on the same database. It let's you define and maintain query logic once, rather than several places.
On the other hand, it's pretty easy for stored procedures themselves to become a big jumbled mess in the database, since most systems don't have a good method for organizing them logically. And they can be more difficult to version and track changes.
I wouldn't personally follow rigid rules. Certainly during the development stages, you want to be able to quickly change your queries so I would inline them.
Later on, I would move to stored procedures because they offer the following two advantages. I'm sure there are more but these two win me over.
1/ Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
One of the big bugbears of a DBA is ad-hoc queries (especially by clowns who don't know what a full table scan is). DBAs prefer to have nice consistent queries that they can tune the database to.
2/ Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
Because the logic for "that list" is stored in the database, the DBAs can modify how it's stored and extracted at will without compromising or changing the client code. This is similar to encapsulation that made object-orientd languages 'cleaner' than what came before.
I've done a mix of inline queries and stored procedures. I prefer more of the stored procedure/view approach as it gains a nice spot for you to make a change if needed. When you have inline queries you always have to go and change the code to change an inline query and then re-roll the application. You also might have the inline query in multiple places so you would have to change a lot more code than with one stored procedure.
Then again if you have to add a parameter to a stored procedure, your still changing a lot of code anyways.
Another note is how often the data changes behind the stored procedure, where I work we have third party tables that may break up into normalized tables, or a table becomes obsolete. In that case a stored procedure/view may minimize the exposure you have to that change.
I've also written a entire application without stored procedures. It had three classes and 10 pages, was not worth it at all. I think there comes a point when its overkill, or can be justified, but it also comes down to your personal opinion and preference.
Are you going to only ever access your database from that one application?
If not, then you are probably better off using stored procedures so that you can have a consistent interface to your database.
Is there any significant cost to distributing your application if you need to make a change?
If so, then you are probably better off using stored procedures which can be changed at the server and those changes won't need to be distributed.
Are you at all concerned about the security of your database?
If so, then you probably want to use stored procedures so that you don't have to grant direct access to tables to a user.
If you're writing a small application, without a wide audience, for a system that won't be used or accessed outside of your application, then inline SQL might be ok.
With Subsonic you will use inline, views and stored procedures. Subsonic makes data access easier, but you can't do everthing in a subsonic query. Though the latest version, 2.1 is getting better.
For basic CRUD operations, inline SQL will be straight forward. For more complex data needs, a view will need to be made and then you will do a Subsonic query on the view.
Stored procs are good for harder data computations and data retrieval. Set based retrieval is usually always faster then procedural processing.
Current Subsonic application uses all three options with great results.
I prefer inline sql unless the stored procedure has actual logic (variables, cursors, etc) involved. I have been using LINQ to SQL lately, and taking the generated classes and adding partial classes that have some predefined, common linq queries. I feel this makes for faster development.
Edit: I know I'm going to get downmodded for this. If you ever talk down on foreign keys or stored procedures, you will get downmodded. DBAs need job security I guess...
The advantages of stored procedure (to my mind)
The SQL is in one place
You are able to get query plans.
You can modify the database structure if necessary to improve performance
They are compiled and thus those query plans do not have to get constructed on the fly
If you use permissions - you can be sure of the queries that the application will make.
Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
the best advantage of using stored procs i found is that when we want to change in the logic, in case of inline query we need to go to everyplace and change it and re- roll the every application but in the case of stored proc change is required only at one place.
So use inline queries when you have clear logic; otherwise prefer stored procs.