I have little experience with Liquibase so far, but I have read that you should always have a rollback strategy.
https://www.liquibase.org/get-started/best-practices
Always have a rollback plan
Write changesets in a way that they can
be rolled back. For example, use a relevant change clause instead of
using a custom tag. Include a clause whenever a
change doesn’t support an out-of-box rollback. (e.g., sql, insert,
etc). Learn more about rollbacks.
In case of needed data modification or when deleting (obsolete) data, I don't know how to handle this in a good way.
I found examples where database entries are deleted and the rollback definition contains two insert statements with fix data. But this is not a real world scenario. Here we possibly have tables with million of records.
Are there any best-practices regarding data manipulation / transformation / deletion, respectively the rollback strategy?
Should data manipulation / destructive operations generally be avoided at all costs?
Hint: I'm not talking about small applications, but enterprise solutions that are developed by many people in different teams and delivered to hundreds of customers. Only to give you a context.
There are various opinions about this but, in real life situations, I don't find rollback strategies very practical. You typically need roll-forward. With that said, liquibase was meant for structural (ddl) changes.
For data, row data in particular in the millions, I would orchestrate this outside of liquibase as a part of your release plans. I have seen large propriatery bank software handle this by renaming/copying table to another table and then making the change.
SO, if you have tableX, and you need to add a column. You can copy the table to tableX_old, modify tableX.
If the migration succeeds, delete tableX_old. If not, rollback using tableX_old to restore your data.
Related
I have set of validations which decides record to be inserted into the database with valid status code, the issue we are facing is that many users are making requests at the same time and middle of one transaction another transaction comes and both are getting inserted with valid status, which it shouldn't. it should return an error that record already exists which can be easily handled by a simple query but at specific scenarios we are allowing them to insert duplicates, I have tried sp_getapplock which is solving my problem but it is compromising performance big time. Are there any optimal ways to handle concurrent requests?
Thanks.
sp_getapplock is pretty much the befiest and most arbitrary lock you can take. It functions more like the lock keyword does in OOO programming. Basically you name a resource, give it a scope (proc or transaction), then lock it. Pretty much nothing can bypass that lock, which is why it's solved your race conditions. It's also probably mad overkill for what you're trying to do.
The first code/architecture idea that comes to mind is to restructure this table. I'm going to assume you have high update volumes or you wouldn't be running into these violations. You could simply use a try/catch block, and have the catch block retry on a PK violation. Clumsy, but might just do the trick.
Next, you could consider altering the structure of the table which receives this stream of updates throughout the day. Make this table primary keyed off an identity column, and pretty much nothing else. Inserts will be lightning fast, so any blockage will be negligible. You can then move this data in batches into a table better suited for batch processing (as opposed to trying to batch-process in real time)
There are also a whole range of transaction isolation settings which adjust SQL's regular locking system to support different variants (whether at the batch level, or inline via query hints. I'd read up on those, but you might consider looking at Serialized isolation. Various settings will enforce different runtime rules to fit your needs.
Also be sure to check your transactions. You probably want to be locking the hell out of this table (and potentially during some other action) but once that need is gone, so should the lock.
I just realized I've had a headache for years. Well, metaphorically speaking. In reality I was looking at my database structure and somehow just realized I never use transactions. Doh.
There's a lot of data on the internet about transactions (begin transaction, rollback, commit, etc.), but surprisingly not much detail about exactly why they are vital, and just exactly how vital?
I understand the concept of handling if something goes wrong. This made sense when one is doing multiple updates, for example, in multiple tables in one go, but this is bad practice as far as I know and I don't do this. All of my queries just update one table. If a query errors, it cancels, transaction or no transaction. What else could go wrong or potentially corrupt a one table update, besides my pulling the plug out of my server?
In other words, my question is,
exactly how vital is it that i implement transactions on all of my tables - I am fully blasphemous for not having them, or does it really matter that much?
UPDATE
+1 to invisal, who pointed out that queries are automatically wrapped as transactions, which I did not know. Pointed out multiple good references on the subject of my question.
This made a lot of sense when one is doing multiple updates, for
example, in multiple tables in one go. But basically all of my queries
just update one table at a time. If a query errors, it cancels,
transaction or no transaction.
In your case, it does nothing. A single statement has its own transaction itself. For more information you can read the existed question and answers:
What does a transaction around a single statement do?
Transaction necessary for single update query?
Do i need transaction for joined query?
Most important property of the database is to keep your data, reliably.
Database reliability is assured by conforming to ACID principles (Atomicity, Consistency, Isolation, Durability). In the context of databases, a single logical operation on the data is called a transaction. Without transactions, such reliability would not be possible.
In addition to reliability, using transactions properly lets you improve performance of some data operations considerably. For example, you can start transaction, insert a lot of data (say 100k rows), and only then commit. Server does not have to actually write to disk until commit is called, effectively batching data in memory. This allows to improve performance a lot.
You should be aware that every updating action against your database is performed inside a transaction, even if only 1 table (SQL server automatically creates a transaction for it).
The reason for always doing transactions is to ensure ACID as others have mentioned. Here I'd like to elaborate on the isolation point. Without transaction isolation, you may have problems with: read uncommitted, unrepeatable read, phantom read,..
it depends if you are updating one table and one row, then the only advantage is going to be in the logging... but if you update multiple row in a table at one time... without transactions you could still run into somecurruption
Well it depends, SQL is most of the times used for supporting data for some host languages like c, c++, java, php, c# and others. Well I have not worked with much technologies.. but if you are using following combinations then here is my point of view:
SQL with C / C++ : Commit Required
SQL with Java : Not Required
SQL with C# : Not Required
SQL with PHP : Not Required
And it also depends which SQL you are using. It would also depend from different flavors of SQL like Oracle SQL, SQL Server, SQLite, MySQL etc...
When you are using Oracle SQL in its console, like Oracle 11g, Oracle 10g etc... COMMIT is required.
And as far as corruption of table and data is concerned. YES it happens, I had a very bad experience with it. So, if you pull out your wire or something while you are updating in your table, then you might end up with a massive disaster.
Well concluding, I will suggest you to do commit.
Scenario:
Each time data is inserted/updated/deleted into/in/from a table, up to 3 things need to happen:
The data needs to be logged to a separate table
Referencial integrity must be enforced on implicit related data (I'm referring to data that should be linked with a foreign key relationship, but isn't: eg. When a updating Table1.Name should also update Table2.Name to the same value)
Arbitrary business logic needs to execute
The architecture and schema of the database must not be changed and the requirements must be accomplished by using triggers.
Question
Which option is better?:
A single trigger per operation (insert/update/delete) that handles multiple concerns (logs, enforces implicit referencial integrity, and executes arbitrary business logic). This trigger could be named D_TableName ("D" for delete).
Multiple triggers per operation that were segregated by concern. They could be named:
D_TableName_Logging - for logging when something is deleted from
D_TableName_RI
D_TableName_BL
I prefer option 2 because a single unit of code has a single concern. I am not a DBA, and know enough about SQL Server to make me dangerous.
Are there any compelling reasons to handle all of the concerns in a single trigger?
Wow, you are in a no-win situation. Who ever requested that all this stuff be done via triggers should be shot and then fired. Enforcing RI via triggers?
You said the architecture and schema of the database must not be changed. However, by creating triggers, you are, at the very least, changing the schema of the database, and, it could be argued, the architecture.
I would probably go with option #1 and create additional stored procs and UDFs that take care of logging, BL and RI so that code is not duplicated amoung the individual triggers (the triggers would call these stored procs and/or UDFs). I really don't like naming the triggers they way you proposed in option 2.
BTW, please tell someone at your organization that this is insane. RI should not be enforced via triggers and business logic DOES NOT belong in the database.
Doing it all in one trigger might be more efficient in that you can possibly end up with fewer operations against the (un indexed) inserted and deleted tables.
Also when you have multiple triggers it is possible to set the first and last one that fires but any others will fire in arbitrary order so you can't control the sequence of events deterministically if you have more than 3 triggers for a particular action.
If neither of those considerations apply then it's just a matter of preference.
Of course it goes without saying that the specification to do this with triggers sucks.
I agree with #RandyMinder. However, I would go one step further. Triggers are not the right way to approach this situation. The logic that you describe is too complicated for the trigger mechanism.
You should wrap inserts/updates/deletes in stored procedures. These stored procedures can manage the business logic and logging and so on. Also, they make it obvious what is happening. A chain of stored procedures calling stored procedures is explicit. A chain of triggers calling triggers is determined by insert/update/delete statements that do not make the call to the trigger explicit.
The problem with triggers is that they introduce dependencies and locking among disparate tables, and it can be a nightmare to disentangle the dependencies. Similarly, it can be a nightmare to determine performance bottlenecks when the problem may be located in a trigger calling a trigger calling a stored procedure calling a trigger.
If you are using Microsoft SQL Server and you're able to modify the code performing the DML statements, you could use the OUTPUT clause to dump the updated, inserted, deleted values into temporary tables or memory variables instead of triggers. This will keep performance penalty to a minimum.
I'm being told by a person with some authority in our company that it's a "database no-no" to create triggers in a database that change rows in another table.
I've used this technique to create default initial configuration, auto-maintaining audit logs, and various other things that would have been a nightmare to consistently maintain inside the heterogeneous applications that connect to that database. For over a decade, I've read that this as an appropriate way to centralize relationship constraint maintenance and get the responsibility out of the applications interacting with the data.
As such, my BS meter is pegging with this. Am I missing something fundamentally wrong with that technique that makes it a bad practice in general?
If you are careful with your trigger code, there is nothing inherently bad about it. Some people get bitten by bad trigger code and then decide that triggers are bad (eventhough it was the bad trigger code that was the problem). They then generalize this as, "never use triggers".
The other problem is....
Using the audit tables as an example, suppose you have a stored procedure that updates a table AND puts data in to an audit table. Now suppose you write trigger code to put data in to the audit table. You could end up with duplicate audit data.
It's personal preference. In my opinion it's bad practice, though. It can be a bit unruly to manage a db that has triggers on tables updating other tables, which raises another trigger to update another table, etc..
To me, makes more sense to wrap all of the functionality into a stored procedure so all the logic is in one place.
To each their own, though.
It may be a company policy, but it is not a definitive no-no. The problem with doing it unless you know and control the database is that the amendments of other tables may be inefficient ( and this can often be difficult to identify as an issue ), and there is a danger of them cascading - that is the amendment of this table fires another trigger which may update another table ....
So I would not call it a no-no as such, but something to be done with care.
I think it's similar to the admonition to avoid goto statements in structured programming. You should look pretty hard to find a "better" answer than putting a bunch of DML into triggers, because it's easy to shoot your foot off by mishandling triggers, but sometimes that's the best answer for the problem at hand.
My experience is mostly in SQL Server. Older versions didn't have change tracking or cascading referential integrity, so you might have had to write triggers to handle logging and referential integrity management yourself in the bad old days. Now there are superior tools for both these functions that are built right into the platform. They're "cleaner" than DML-laden triggers, but still allow you to keep those functions centralized.
Even so, there are still cases where writing code in a trigger gets the job done best. Don't let dogma stop you from doing your job, if you conclude through analysis that triggers are the tool you need.
Not all triggers that insert rows into another table are bad, for example a trigger than keeps a previous version of the row in an AUDIT table, as you say. But without knowing the particulars, I do suspect that your "default initial configuration" created via trigger might be better done in a stored procedure, because the SP is more self-documenting.
Idiosyncractic side-effects are to be avoided -- and by 'idiosyncractic' I mean one person's idea of a short-cut that circumvents the tried-and-true standard way of doing something. They make ongoing maintenance difficult and can be booby traps for the unwary programmer who happens along later.
I'm still looking for that book in which "best practice" is defined - I think I'll find it next to "The big book of database no-nos".
Personally, I've encouraged teams to avoid triggers where possible.
I've had bad experiences with them - I agree that hitting oneself with a hammer doesn't make all hammers evil, but triggers are more like nail guns than hammers - the opportunity for harm is definitely there.
Triggers are "side effects" - your explicit intention is to insert a record in a table; as a side effect of this intention, a whole bunch of other stuff happens. Most teams I work with don't have "the database guy" - developers work across multiple layers, and keeping all the side effects in their brain is taxing. This creates the opportunity for bugs and performance problems - not because someone is stupid or lazy, but because the technology has been made more complex.
There are problems which are best (or even only) solved through triggers - enforcing referential integrity is a classic. There are problems which are often solved through triggers which make me nervous - especially when they reflect business rules, or the use of audit table. There are problems which - in my view - are too close to the nail gun trigger to be solved by triggers e.g calculations in the business domain (e.g. calculating the tax amount for a sale).
We have a bit of a messy database situation.
Our main back-office system is written in Visual Fox Pro with local data (yes, I know!)
In order to effectively work with the data in our websites, we have chosen to regularly export data to a SQL database. However the process that does this basically clears out the tables each time and does a re-insert.
This means we have two SQL databases - one that our FoxPro export process writes to, and another that our websites read from.
This question is concerned with the transform from one SQL database to the other (SqlFoxProData -> SqlWebData).
For a particular table (one of our main application tables), because various data transformations take places during this process, it's not a straightforward UPDATE, INSERT and DELETE statements using self-joins, but we're having to use cursors instead (I know!)
This has been working fine for many months but now we are starting to hit upon performance problems when an update is taking place (this can happen regularly during the day)
Basically when we are updating SqlWebData.ImportantTable from SqlFoxProData.ImportantTable, it's causing occasional connection timeouts/deadlocks/other problems on the live websites.
I've worked hard at optimising queries, caching etc etc, but it's come to a point where I'm looking for another strategy to update the data.
One idea that has come to mind is to have two copies of ImportantTable (A and B), some concept of which table is currently 'active', updating the non-active table, then switching the currenly actice table
i.e. websites read from ImportantTableA whilst we're updating ImportantTableB, then we switch websites to read from ImportantTableB.
Question is, is this feasible and a good idea? I have done something like it before but I'm not convinced it's necessarily good for optimisation/indexing etc.
Any suggestions welcome, I know this is a messy situation... and the long term goal would be to get our FoxPro application pointing to SQL.
(We're using SQL 2005 if it helps)
I should add that data consistency isn't particularly important in the instance, seeing as the data is always slightly out of date
There are a lot of ways to skin this cat.
I would attack the locking issues first. It is extremely rare that I would use CURSORS, and I think improving the performance and locking behavior there might resolve a lot of your issues.
I expect that I would solve it by using two separate staging tables. One for the FoxPro export in SQL and one transformed into the final format in SQL side-by-side. Then either swapping the final for production using sp_rename, or simply using 3 INSERT/UPDATE/DELETE transactions to apply all changes from the final table to production. Either way, there is going to be some locking there, but how big are we talking about?
You should be able to maintain one db for the website and just replicate to that table from the other sql db table.
This is assuming that you do not update any data from the website itself.
"For a particular table (one of our main application tables), because various data transformations take places during this process, it's not a straightforward UPDATE, INSERT and DELETE statements using self-joins, but we're having to use cursors instead (I know!)"
I cannot think of a case where I would ever need to perform an insert, update or delete using a cursor. If you can write the select for the cursor, you can convert it into an insert, update or delete. You can join to other tables in these statements and use the case stament for conditional processing. Taking the time to do this in a set -based fashion may solve your problem.
One thing you may consider if you have lots of data to move. We occassionally create a view to the data we want and then have two tables - one active and one that data will be loaded into. When the data is finsihed loading, as part of your process run a simple command to switch the table the view uses to the one you just finshed loading to. That way the users are only down for a couple of seconds at most. You won't create locking issues where they are trying to access data as you are loading.
You might also look at using SSIS to move the data.
Do you have the option of making the updates more atomic, rather than the stated 'clear out and re-insert'? I think Visual Fox Pro supports triggers, right? For your key tables, can you add a trigger to the update/insert/delete to capture the ID of records that change, then move (or delete) just those records?
Or how about writing all changes to an offline database, and letting SQL Server replication take care of the sync?
[sorry, this would have been a comment, if I had enough reputation!]
Based on your response to Ernie above, you asked how you replicate databases. Here is Microsoft's how-to about replication in SQL2005.
However, if you're asking about replication and how to do it, it indicates to me that you are a little light in experience for SQL server. That being said, it's fairly easy to muck things up and while I'm all for learning by experience, if this is mission critical data, you might be better off hiring a DBA or at the very least, testing the #$##$% out of this before you actually implement it.