Scenario:
Each time data is inserted/updated/deleted into/in/from a table, up to 3 things need to happen:
The data needs to be logged to a separate table
Referencial integrity must be enforced on implicit related data (I'm referring to data that should be linked with a foreign key relationship, but isn't: eg. When a updating Table1.Name should also update Table2.Name to the same value)
Arbitrary business logic needs to execute
The architecture and schema of the database must not be changed and the requirements must be accomplished by using triggers.
Question
Which option is better?:
A single trigger per operation (insert/update/delete) that handles multiple concerns (logs, enforces implicit referencial integrity, and executes arbitrary business logic). This trigger could be named D_TableName ("D" for delete).
Multiple triggers per operation that were segregated by concern. They could be named:
D_TableName_Logging - for logging when something is deleted from
D_TableName_RI
D_TableName_BL
I prefer option 2 because a single unit of code has a single concern. I am not a DBA, and know enough about SQL Server to make me dangerous.
Are there any compelling reasons to handle all of the concerns in a single trigger?
Wow, you are in a no-win situation. Who ever requested that all this stuff be done via triggers should be shot and then fired. Enforcing RI via triggers?
You said the architecture and schema of the database must not be changed. However, by creating triggers, you are, at the very least, changing the schema of the database, and, it could be argued, the architecture.
I would probably go with option #1 and create additional stored procs and UDFs that take care of logging, BL and RI so that code is not duplicated amoung the individual triggers (the triggers would call these stored procs and/or UDFs). I really don't like naming the triggers they way you proposed in option 2.
BTW, please tell someone at your organization that this is insane. RI should not be enforced via triggers and business logic DOES NOT belong in the database.
Doing it all in one trigger might be more efficient in that you can possibly end up with fewer operations against the (un indexed) inserted and deleted tables.
Also when you have multiple triggers it is possible to set the first and last one that fires but any others will fire in arbitrary order so you can't control the sequence of events deterministically if you have more than 3 triggers for a particular action.
If neither of those considerations apply then it's just a matter of preference.
Of course it goes without saying that the specification to do this with triggers sucks.
I agree with #RandyMinder. However, I would go one step further. Triggers are not the right way to approach this situation. The logic that you describe is too complicated for the trigger mechanism.
You should wrap inserts/updates/deletes in stored procedures. These stored procedures can manage the business logic and logging and so on. Also, they make it obvious what is happening. A chain of stored procedures calling stored procedures is explicit. A chain of triggers calling triggers is determined by insert/update/delete statements that do not make the call to the trigger explicit.
The problem with triggers is that they introduce dependencies and locking among disparate tables, and it can be a nightmare to disentangle the dependencies. Similarly, it can be a nightmare to determine performance bottlenecks when the problem may be located in a trigger calling a trigger calling a stored procedure calling a trigger.
If you are using Microsoft SQL Server and you're able to modify the code performing the DML statements, you could use the OUTPUT clause to dump the updated, inserted, deleted values into temporary tables or memory variables instead of triggers. This will keep performance penalty to a minimum.
Related
I have little experience with Liquibase so far, but I have read that you should always have a rollback strategy.
https://www.liquibase.org/get-started/best-practices
Always have a rollback plan
Write changesets in a way that they can
be rolled back. For example, use a relevant change clause instead of
using a custom tag. Include a clause whenever a
change doesn’t support an out-of-box rollback. (e.g., sql, insert,
etc). Learn more about rollbacks.
In case of needed data modification or when deleting (obsolete) data, I don't know how to handle this in a good way.
I found examples where database entries are deleted and the rollback definition contains two insert statements with fix data. But this is not a real world scenario. Here we possibly have tables with million of records.
Are there any best-practices regarding data manipulation / transformation / deletion, respectively the rollback strategy?
Should data manipulation / destructive operations generally be avoided at all costs?
Hint: I'm not talking about small applications, but enterprise solutions that are developed by many people in different teams and delivered to hundreds of customers. Only to give you a context.
There are various opinions about this but, in real life situations, I don't find rollback strategies very practical. You typically need roll-forward. With that said, liquibase was meant for structural (ddl) changes.
For data, row data in particular in the millions, I would orchestrate this outside of liquibase as a part of your release plans. I have seen large propriatery bank software handle this by renaming/copying table to another table and then making the change.
SO, if you have tableX, and you need to add a column. You can copy the table to tableX_old, modify tableX.
If the migration succeeds, delete tableX_old. If not, rollback using tableX_old to restore your data.
When implementing "Triggers" , is it required to check the criteria/condition for trigger WHENEVER a change is made? If it is so, can someone have a large number of triggers?
Or else, how is it done?
Triggers add overhead to the operation they are working on. As with any other stored code, triggers can be made very complex and time consuming. You generally want to avoid those.
Part of the issue with triggers is that they often old locks on tables/rows. This can slow down other components of a query.
The number of triggers depends on the database. Some limit to a single trigger per action per table.
In general, it is considered good practice to use other mechanisms if they can be used -- for instance, NULL constraints, DEFAULT values, and CHECK constraints.
I tend to avoid triggers because they are hard to maintain. But there are well-designed, highly performant systems that do use them.
I like to use Upsert stored procedures that update records if they exist or insert them if they don't. Without them, I would need to first find out if the record exists, and then have two separate stored procedures that I would call based on the result.
I never really thought about the issue before today when I was creating a stored procedure called UpdateOrDeleteRow. As soon as I found myself including "Or" in the name, my SRP spider sense kicked in, and I realized that the upserts are basically the same thing.
Is this a violation of SRP? If so, is it acceptable? If not, what should I do?
I realize that the SRP is an OOP principle, and T-SQL is not an OOP language, but the basis for the principle seems like it should apply here as well.
There is another principle, which I like even more, than SRP - DRY. So, if you call this sequence in one place, you can think about single responsibility. But when you repeating same sequence of actions several times, DRY makes me to remove duplication.
BTW Just come to my mind, that you can avoid OR in procedure/method name. UpdateOrInsert operation has very good name Save. I think it does not breaks SRP.
Personally I don't believe that this principle applies completely in SQL Server. Stored procedures don't always perform just one action (and I think the notion that a stored procedure is equivalent to a class is flawed). I don't think it makes sense to split every single statement in a stored procedure into its own stored procedure. You can get absolutely ridiculous with this.
There is a balance of course, as you can be ridiculous the other way. You don't want a stored procedure with 18 different ways to specify parameters so that it can do 540 different things based on the combinations.
For an UPSERT I would still suggest that a single stored procedure is fine for this. If you want to feel better about it serving a single purpose, change your update/insert into a single MERGE. :-) That said, and in all seriousness, be very careful with MERGE.
I would disagree that the principal should apply in this case as it makes for some redundant code in your codebehind.
First lets examine what your UPSERT does, It checks if data exists then based on that it executes an INSERT or an UPDATE.
In codebehind to do this you have to make 2 calls to your database, depending on how your application is structured this could also mean opening and closing two connections.
So you have 3 methods in codebehind (one to execute each proc) then a method to call each of those methods and do the logic to decide if you need to insert or update.
You also have 3 seperate stored procedures in your database to do each of the actions.
This to me seems like badly structured code since you would be passing the same parameters to your insert / update procedures as you would to your upsert, it therefore makes sense to do this all in one place.
By using an UPSERT you have 1 stored procedure and only need the one connection, with one method to be called from codebehind. I think that this makes for much better, cleaner code.
If you already have procs that do the Update or Delete operations independently, hopefully with logging for auditing purposes, you could have your upsert proc call those individually. That way only those procs are doing the work which should help keep things manageable, even if they're being called from multiple locations.
The single responsibility principle says that an object should only have one reason to change. The only reason that an Upsert stored procedure should change is if the table structure changes. Thus, I think you are okay in creating upsert stored procedures.
I hate triggers. I've been tripped up by them way too many times. But I'd like to know if it's a better to specify the time every time a row is updated or let a trigger take care of it to keep code to a minimum?
Just to clarify, the table the trigger would be on would have a column called something like LastModified
The particular scenario I'm dealing with is I am one of the developers who uses a database with about 400 stored procedures. There are about 20 tables that would have this LastModified column. These tables would be updated by about 10 different stored procedures each.
Triggers can definitely be a huge problem, especially if there are multiple layers of them. It makes debugging, performance tuning, and understanding the data logic almost impossible.
But if you keep your design to a single layer (a trigger for the table), and it is used soley for auditing (i.e. putting the updated time), I don't think that'll be a big problem at all.
Equally, if you are using a stored procedure to be the actor to your tables and views, I think it would make just as much sense (and be a lot easier to remember and look back on) to have your stored procedure put in the current datetime stamp. I think that's a great design.
But if you're using ad hoc queries, and you have a datetime field as not null, it will be a hindrance to remember to call the current datetime. Obviously this won't be a problem with the aforementioned two ideas with the stored procedure or the trigger.
So I think in this situation, is should be personal preference (as long as you don't make a string of triggers that become spaghetti).
You can always use a TimeStamp column (in MySql). If it's MSSQL, just note that the timestamp column is a serial number, not a DateTime data type.
Generally, I avoid triggers like the plague (close to on par with cursors) and so always just update the column manually. To me, it helps reinforce the business logic. But in truth, "better" is a a personal opinion.
Generally, triggers are most useful as a bottleneck if you have a table that needs certain business logic applied for every update, and you have updates coming from different sources.
For example suppose you have php and python apps that both use the database. You could try and remember to update the python code every time you update the php, and vice versa, and hope that they both work the same. Or you could use a trigger, so that no matter what client connects, the same thing will happen to the table.
In most other cases triggers are a bad idea and will cause more pain than they are worth.
Its true that some triggers can often be a source of confusion and frustration when debugging, especially when there are cascading foreign key updates.
But they do have their use.
In this case, you have a choice of updating 400 stored procedures, and updating the date manually. If you miss one of those stored procs, then the field is as good as useless.
Also while triggers 'hide' functionality, the same can be said of explicitly updating it in a stored procedure. What happens when someone writes a new stored procedure, you need to document that the field should be updated.
Personally if its critical the field is up to date, I would use a trigger. Use an INSTEAD OF trigger, so that instead of doing an update and that triggering the trigger, you are simply overriding the insert statement.
I am working on a stored procedure that performs some operations on students in the class
In the last step it updates status of some of the students based on some criteria.
It is all pretty straight forward but I have a dilemma here.
Basically there is an existing sp in the system called
pUpdateStudentStatus(studentID, statusID, comments, userID)
This sp is used by the application whenever a status of a single user is to be updated. Apart from updating the status it also logs the changes in the StudentStatusHistory table.
So here is my dilemma,
if I want to use that stored procedure I need loop through the records (either by cursor or by writing loop myself)
if I want to keep all operations set based I need to copy the logic from the pUpdateStudentStatus (which may change in the future)
Are there any other options? Which one would you choose?
I believe an alternative approach with the update trigger is not a way to go as I need some extra details such as userId of the user that changed the status, and comments
I am using SqlServer2005
You don't say whether pUpdateStudentStatus is under your control or created by a third party.
If it's a third party SP, I don't think you have a lot of choice but to use a cursor/loop, since the internals of the SP may change in future releases.
If the SP is under your control, another option would be to create a version of pUpdateStudentStatus with a new name which will operate in a set-based fashion (perhaps by accepting a table variable of arguments), then re-write the existing pUpdateStudentStatus to act as a wrapper calling the new procedure with a single row in the argument table.
Personally unless performance is an issue (and it sounds like this is most likely the sort of job that will run occasionally and maybe even scheduled outside work hours) I would loop over the existing procedure. CPU is invariably cheaper than DBA/Programmer time and maintenance considerations should override efficiency unless there is an impact on the business by not doing so. Either way you should document why you have adopted whichever approach you choose in the code.
Also, if you don't already have a documentation regime I would suggest setting up a simple documentation table within the database with (at least) sp name and descriptive text. Because of the nature of stored procedures/user defined functions keeping overview control over what functionality has been implemented where can be tricky unless some strategy is adopted and I've seen far too many databases where there is a mass of stored procedures/udfs and no simple method of groking what functionality has been implemented where. Version control and full documentation is to be applauded if your group supports it, but if that isn't available then documenting the database inside itself is simple, robust, and a quick win.
If you want to keep the operation set-based then yes, sadly, you will need to copy and paste the sql from pUpdateStudentStatus.
You will need to decide between the performance of a set-based update on the one hand, and code re-use (and ease-of-maintenance) on the other. I know which I would normally choose, but your choice depends on your need for performance versus other considerations.
If you are doing a small number of records, looping is acceptable, but if the batch processes ever get big, you will need set-based code.
An other alternative to what other suggested if you end up needing the set-based logic is to change the proc to allow either set-based or individual inserts. By making the parameters optional (your GUI will need to check to make sure all required parameters are passed for individual inserts) and adding a parameter for a batchnumber to be passed in for set-based operation, you can put the logic for both in one proc.
If batch number is null, do the current actions. If it is passed go to the batch processing part of the proc. For batch processses, the insert proc can becalled by another proc that generates a new batch number, inserts the info you want to insert into a work table including the batch number. then it uses the batchnumebr as the input parameter for the insert proc.
You still will have to write the logic for both cases, but since they are in the same proc, they will be easier to maintain and you will be less likely to forget to update both processes.