Looking for some insights on using Transaction or Delete queries when the subsequent request fails. In brief, in my application, I'm inserting into two tables by calling two stored procedures and the inserted data would be uploaded into two REST APIs. If anyone of the REST API is failed I have to rollback the details entered into database.
So which approach is suitable? Either to use SQL transaction or Delete the inserted records through database Procedure.
This is and ideal situation to use transaction. How do you know it?
Let's say you insert some rows, then do API call, then try to delete inserted rows. What will happen in that case?
Inserted rows are readable already (even without dirty read enabled) - they are just normal rows in database. So all the queries made until you finish you request, will relate to this rows as well.
What will happen if you fail to delete the rows? Exactly, they will just stay in database. Here you have improper data. Bad.
Use transaction approach - start transaction and commit it only when you finished API call, this way you will ensure, that your database contains proper data at all times.
Related
I am writing an archival script (in Python using psycopg2) that needs to pull a very large amount of data out of a PostgreSQL database (9.4), process, upload and then delete it from the database.
I start a transaction, execute a select statement to create a named cursor, fetch N rows at a time from the cursor and do processing and uploading of parts (using S3 multipart upload). Once the cursor is depleted and no errors occurred, I finalize the upload and execute a delete statement using the same conditions as I did in select. If delete succeeds, I commit the transaction.
The database is being actively written to and it is important that both the same rows get archived and deleted and that reads and writes to the database (including the table being archived) continue uninterrupted. That said, the tables being archived contain logs, so existing records are never modified, only new records are added.
So the questions I have are:
What level of isolation should I use to ensure same rows get archived and deleted?
What impact will these operations have on database read/write ability? Does anything get write or read locked in the process I described above?
You have two good options:
Get the data with
SELECT ... FOR UPDATE
so that the rows get locked. Then the are guaranteed to be there when you delete them.
Use
DELETE FROM ... RETURNING *
Then insert the returned rows into your archive.
The second solution is better, because you need only one statement.
Nothing bad can happen. If the transaction fails for whatever reason, no row will be deleted.
You can use the default READ COMMITTED isolation level for both solutions.
We insert SQL entities into a table, one by one. It's easy and fast. After the entity insert, we are executing an SP to updates several tables according to the new entity, update some calculated fields, some lookup tables to help to find this new entity. This takes a lot of time and sometimes ends up in a deadlock state.
Inserting the main entity must be fast and reliable, updating the additional tables is not important to happen immediately. I was wondering (I am not a DB expert) if there is an SQL methodology similar to the thread handling in C#, to maintain an update thread, which can be awakened when a new entity arrives to update the additional tables after the insertion. This thread can update these tables in "one thread" to avoid deadlock.
I can imagine an sql job which executes every minute, searches for new entities and executes the updates, but it seems too rough to me.
What is the best practice to implement this on MS SQL side?
There are a number of ways you could achieve this. You mention that the two can be done separately - immediate updating is not important. In that case, you could set up a SQL Agent to run a stored procedure that checks for missing records and performs the update.
Another approach would be to put the entire original update inside a stored procedure responsible for performing the update and all the housekeeping work, then all you would do is call the stored procedure with the right parameters and it would do all the work behind the curtain.
Another way would be to add triggers on the inserted table to do the update for you. Sounds like the first is what you probably want.
Reading up the documentation of PL-SQL CREATE TRIGGER statement in ORACLE, I went through the following bit of information:
When a trigger fires, tables that the trigger references might be
undergoing changes made by SQL statements in other users'
transactions. SQL statements running in triggers follow the same rules
that standalone SQL statements do.
It basically says the rules that would apply to two conflicting standalone SQL statements (running at the same time) are unchanged when one of the statements is performed from within a trigger.
So we have some "usual" rules about concurrent transactions and, as for these rules, the following two are mentioned:
Specifically:
Queries in the trigger see the current read-consistent materialized
view of referenced tables and any data changed in the same
transaction.
Updates in the trigger wait for existing data locks to be released
before proceeding.
These two rules look like "obscure" to non-expert users.
What do they mean more precisely?
Queries in the trigger see the current read-consistent materialized
view of referenced tables and any data changed in the same
transaction.
This means the data the trigger sees, like if it does a SELECT on a different table, represents the state of that table when the statement started running. The trigger does not see rows that have been changed by other sessions that have not been committed yet.
Updates in the trigger wait for existing data locks to be released
before proceeding.
When an Oracle statement modifies a row, the row is locked against other people changing it until that session either commits or rolls back its transaction. So if you do an insert on table A, your trigger does an update on table B, but someone else's session has already done an update on table B for that same row, your transaction will wait until they commit or rollback.
Problem statement
I have a view for recursively collecting and aggregating infos from 3 different large to very large tables. This view itself needs quite a time to execute but is needed in many select statements and is executed quite often.
The resulting view, however, is very small (a few dozend results in 2 columns).
All updating actions typically start a transaction, execute many thousand INSERTs and then commit the transaction. This does not occur very frequently, but if something is written to the database it is usually a large amount of data.
What I tried
As the view is small, does not change frequently and is read often, I thought of creating an indexed view. However, sadly you can not create an indexed view with CTEs or even recursive CTEs.
To 'emulate' a indexed or materialized view, I thought about writing a trigger that executes the view and stores the results into a table every time one of the base tables get modified. However, I guess this would take forever if a large amout of entries are UPDATEed or INSERTed and the trigger runs for each INSERT/UPDATE statement on those tables, even if they are inside a single transaction.
Actual question
Is it possible to write a trigger that runs once before commiting and after the last insert/update statement of a transaction has finished and only if any of the statements has changed any of the three tables?
No, there's no direct way to make a trigger that runs right before the end of a transaction. DML Triggers run once per triggering DML statement (INSERT, UPDATE, DELETE), and there's no other kind of trigger related to data modification.
Indirectly, you could have all of your INSERT's insert into a temporary table and then INSERT them all together from the #temp table into the real table, resulting in one trigger firing for that table. But if you are writing to multiple tables, you would still have the same problem.
The SOP (Standard Operating Practice) way to address this is to have a stored procedure handle everything up front instead of a Trigger trying to catch everything on the back-side.
If data consistency is important, then I'd recommend that you follow the SOP approach based in a stored procedure that I mentioned above. Here's a hi-level outline of this approach:
Use a stored procedure that dumps all of the changes into #temp tables first,
then start a transaction,
then make the changes, moving data/changes from your #temp table(s) into your actual tables,
then do the follow-up work you wanted in a trigger. If these are consistency checks, then if they fail, you should rollback the transaction.
Otherwise, it then commits the transaction.
This is almost always how something like this is done correctly.
If your view is small and queried frequently and your underline tables are rarely changed, you don't need a "view". Instead you need a summary table with the same result of the view and updated by your triggers on each underline table.
A trigger is triggered every time you have data modification (insert, delete and update) but one modification will only trigger once, whether it updates one record or one million rows. You don't need worry about the size of update. Instead the frequency of updating is your concern.
If your have a procedure periodically insert large number of rows, or updates large number of rows one by one, you can change the procedure and disable the triggers before the update so the summary table will be updated only before the end of procedure, where you can call the same "sum" procedure and enable those triggers.
If you HAVE TO keep the "summary" up-to-date all the time, even during large number of transactions (i doubt it's very helpful or practical, if your view is slow to execute), you can disable those triggers, do some calculation by yourself after each transaction, update the summary table after each transaction, in your procedure.
I have a task to do that will require me using a transaction to ensure that many inserts will be completed or the entire update rolled back.
I am concerned about the amount of data that needs to be inserted in this transaction and whether this will have a negative affect on the server.
We are looking at about 10,000 records in table1 and 60,0000 records into table2.
Is this safe to do in a single transaction?
have you thought about using a bulk data loader like SSIS or the data import wizard that comes with sql server?
the data import wizard is pretty simple.
In management studio right click on the database you want to import data into. Then select tasks and import data. Follow the wizard prompts. If a record fails the whole transaction will fail.
I have loaded millions of records this way (and using SSIS).
it is safe, however keep in mind that you might be blocking other users during that time. Also take a look at bcp or BULK INSERT to make the inserts faster