I am writing an archival script (in Python using psycopg2) that needs to pull a very large amount of data out of a PostgreSQL database (9.4), process, upload and then delete it from the database.
I start a transaction, execute a select statement to create a named cursor, fetch N rows at a time from the cursor and do processing and uploading of parts (using S3 multipart upload). Once the cursor is depleted and no errors occurred, I finalize the upload and execute a delete statement using the same conditions as I did in select. If delete succeeds, I commit the transaction.
The database is being actively written to and it is important that both the same rows get archived and deleted and that reads and writes to the database (including the table being archived) continue uninterrupted. That said, the tables being archived contain logs, so existing records are never modified, only new records are added.
So the questions I have are:
What level of isolation should I use to ensure same rows get archived and deleted?
What impact will these operations have on database read/write ability? Does anything get write or read locked in the process I described above?
You have two good options:
Get the data with
SELECT ... FOR UPDATE
so that the rows get locked. Then the are guaranteed to be there when you delete them.
Use
DELETE FROM ... RETURNING *
Then insert the returned rows into your archive.
The second solution is better, because you need only one statement.
Nothing bad can happen. If the transaction fails for whatever reason, no row will be deleted.
You can use the default READ COMMITTED isolation level for both solutions.
Related
Looking for some insights on using Transaction or Delete queries when the subsequent request fails. In brief, in my application, I'm inserting into two tables by calling two stored procedures and the inserted data would be uploaded into two REST APIs. If anyone of the REST API is failed I have to rollback the details entered into database.
So which approach is suitable? Either to use SQL transaction or Delete the inserted records through database Procedure.
This is and ideal situation to use transaction. How do you know it?
Let's say you insert some rows, then do API call, then try to delete inserted rows. What will happen in that case?
Inserted rows are readable already (even without dirty read enabled) - they are just normal rows in database. So all the queries made until you finish you request, will relate to this rows as well.
What will happen if you fail to delete the rows? Exactly, they will just stay in database. Here you have improper data. Bad.
Use transaction approach - start transaction and commit it only when you finished API call, this way you will ensure, that your database contains proper data at all times.
I have an issue with my data flow task locking, this task compares a couple of tables, from the same server and the result is inserted into one of the tables being compared. The table being inserted into is being compared by a NOT EXISTS clause.
When performing fast load the task freezes with out errors when doing a regular insert the task gives a dead lock error.
I have 2 other tasks that perform the same action to the same table and they work fine but the amount of information being inserted is alot smaller. I am not running these tasks in parallel.
I am considering using no locks hint to get around this because this is the only task that writes to a cerain table partition, however I am only coming to this conclusion because I can not figure out anything else, aside from using a temp table, or a hashed anti join.
Probably you have so called deadlock situation. You have in your DataFlow Task (DFT) two separate connection instances to the same table. The first conn instance runs SELECT and places Shared lock on the table, the second runs INSERT and places a page or table lock.
A few words on possible cause. SSIS DFT reads table rows and processes it in batches. When number of rows is small, read is completed within a single batch, and Shared lock is eliminated when Insert takes place. When number of rows is substantial, SSIS splits rows into several batches, and processes it consequentially. This allows to perform steps following DFT Data Source before the Data Source completes reading.
The design - reading and writing the same table in the same Data Flow is not good because of possible locking issue. Ways to work it out:
Move all DFT logic inside single INSERT statement and get rid of DFT. Might not be possible.
Split DFT, move data into intermediate table, and then - move to the target table with following DFT or SQL Command. Additional table needed.
Set a Read Committed Snapshot Isolation (RCSI) on the DB and use Read Committed on SELECT. Applicable to MS SQL DB only.
The most universal way is the second with an additional table. The third is for MS SQL only.
How to lock a table in SQL Server ? I found running queries with lock and also read transactions but
confused how to use these.
I have two processes which are reading a table first then updating data in it . I want only one to update and other get this update in its read . working of my processes is as follows:-
Lock table
read data
update data if it is not updated by other process.
release Lock.
thanks
You can use TABLOCKX hint to lock entire table, but locking entire table is usually a bad idea, you might want to reconsider if you really need it.
If you want to ensure you're updating latest data, you can use rowversion column, and double check before update instead of locking entire table for reading.
In your select statement you can provide a "select for update" table hint: with (updlock). Depending on what percentage of records you are updating and their physical distribution this might perform better than a table lock.
But as Fedor Hajdu pointed out, what you probably want is an optimistic locking scheme. Check out the documentation for the READ COMMITTED SNAPSHOT isolation level. You might also find this article useful as an introduction.
My understanding of deadlocks is - two processes trying to contend for same resource - typically two processes trying to 'write' to same row of data. If all one process is doing is reading the data - and the other process is updating the data, how is that a resource contention? Yet, in our database, which is set to the default transaction level 'ReadCommitted', we are seeing several deadlock exceptions.
ReadComitted definitin - Data that has been modified (but not yet committed) cannot be read. That is fine – but should SQL Server throw a deadlock exception if it encounters this ‘dirty read’ taking place?
Anybody have real world experience with this scenario? I found a blog post (by the stackoverflow developer, no less :) claiming that this might be true.
ReadCommitted Transaction Isolation Level initially obtains a Shared Lock on a resource i.e while reading the row but when we try to UPDATE the row it obtains an Exclusive lock on the resources. Multiple user can have shared locks on same rows and it wont effect but as soon as One user tries to update a row It gets an Exclusive Lock on the row which can result in A dead lock when a user who could initially see the record because of the shared locks on the row but now when the user tries to update it It already has an exclusive lock on it by the 1st user. Imagine a scenario where User1 and User2 Both has shared locks and when they try to update some records they both get Exclusive locks on the rows which other user need to commit the transaction. this will result in a DEAD LOCK.
In case of a DeadLock if the Priority Level is not set SQL Server will wait for sometime and then it will RollBack the transaction which is cheaper to rollback.
Edit
Yes if User1 is only reading data and User2 trys to Update some data and there a non-clustered index on that table, it is possible.
User1 is reading Some Data and obtains a shared lock on the non-clustered index in order to perform a lookup, and then tries to obtain a shared lock on the page contianing the data in order to return the data itself.
User2 who is writing/Updating first obtains an exlusive lock on the database page containing the data, and then attempts to obtain an exclusive lock on the index in order to update the index.
Yes, it can happen. Imagine you have two processes each with its own transaction. The first updates TableA then tries to update TableB. The second updates TableB then tries to update TableA. If you're unlucky, both processes manage to complete their first step and then wait indefinitely to the other in order to complete the second step.
Incidentally, that's one of the most common ways to avoid deadlocks: be consistent in order in which you update your table. If both processes updated TableA first then TableB, the deadlock wouldn't occur.
I need to delete all rows of a table containing a bunch of files. Doing a simple DELETE FROM will more or less lock up the computer because of the sheer number of rows and size of files. I'm looking to create a SQL script that will accomplish this task without locking up my computer, can anyone point me in the right direction for this?
Thank you in advance.
Did you try TRUNCATE - this will delete everything.
TRUNCATE TABLE yourTable
from MSDN
TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause; however, TRUNCATE TABLE is faster and uses fewer system and transaction log resources.
Compared to the DELETE statement, TRUNCATE TABLE has the following
advantages:
Less transaction log space is used.
The DELETE statement removes rows one at a time and records an entry
in the transaction log for each deleted row. TRUNCATE TABLE removes
the data by deallocating the data pages used to store the table data
and records only the page deallocations in the transaction log.
Fewer locks are typically used.
When the DELETE statement is executed using a row lock, each row in
the table is locked for deletion. TRUNCATE TABLE always locks the
table and page but not each row.
Without exception, zero pages are left in the table.
After a DELETE statement is executed, the table can still contain
empty pages. For example, empty pages in a heap cannot be deallocated
without at least an exclusive (LCK_M_X) table lock. If the delete
operation does not use a table lock, the table (heap) will contain
many empty pages. For indexes, the delete operation can leave empty
pages behind, although these pages will be deallocated quickly by a
background cleanup process.