Lock the newly inserted row [PostgreSQL] - sql

Is there a way to lock a recently inserted row so that other transaction do will not see it while my current transaction is still going on?

You don't have to. In fact, you can't have the opposite behaviour. There's no way to make your newly inserted row visible until your transaction commits.
Despite the fact that it's not visible to concurrent SELECT, it can still affect concurrent INSERT (or UPDATEs) though. Specifically, if you try to insert the same value into a unique index in two different transactions, one will block until the other commits or rolls back. Then it'll decide if it needs to raise a unique violation error, or if it can continue. So while you cannot directly see uncommitted data, sometimes you can see its side effects.

Related

SQL unique field: concurrency bugs? [duplicate]

This question already has answers here:
Only inserting a row if it's not already there
(7 answers)
Closed 9 years ago.
I have a DB table with a field that must be unique. Let's say the table is called "Table1" and the unique field is called "Field1".
I plan on implementing this by performing a SELECT to see if any Table1 records exist where Field1 = #valueForField1, and only updating or inserting if no such records exist.
The problem is, how do I know there isn't a race condition here? If two users both click Save on the form that writes to Table1 (at almost the exact same time), and they have identical values for Field1, isn't it possible that the following would happen?
User1 makes a SQL call, which performs the select operation and determines there are no existing records where Field1 = #valueForField1. User1's process is preempted by User2's process, which also finds no records where Field1 = #valueForField1, and performs an insert. User1's process is allowed to run again, and inserts a second record where Field1 = #valueForField1, violating the requirement that Field1 be unique.
How can I prevent this? I'm told that transactions are atomic, but then why do we need table locks too? I've never used a lock before and I don't know whether or not I need one in this case. What happens if a process tries to write to a locked table? Will it block and try again?
I'm using MS SQL 2008R2.
Add a unique constraint on the field. That way you won't have to SELECT. You will only have to insert. The first user will succeed the second will fail.
On top of that you may make the field autoincremented, so you won't have to care on filling it, or you may add a default value, again not caring on filling it.
Some options would be an autoincremented INT field, or a unique identifier.
You can add a add a unique constraint. Example from http://www.w3schools.com/sql/sql_unique.asp:
CREATE TABLE Persons
(
P_Id int NOT NULL UNIQUE
)
EDIT: Please also read Martin Smith's comment below.
jyparask has a good answer on how you can tackle this specific problem. However, I would like to elaborate on your confusion over locks, transactions, blocking, and retries. For the sake of simplicity, I'm going to assume transaction isolation level serializable.
Transactions are atomic. The database guarantees that if you have two transactions, then all operations in one transaction occur completely before the next one starts, no matter what kind of race conditions there are. Even if two users access the same row at the same time (multiple cores), there is no chance of a race condition, because the database will ensure that one of them will fail.
How does the database do this? With locks. When you select a row, SQL Server will lock the row, so that all other clients will block when requesting that row. Block means that their query is paused until that row is unlocked.
The database actually has a couple of things it can lock. It can lock the row, or the table, or somewhere in between. The database decides what it thinks is best, and it's usually pretty good at it.
There is never any retrying. The database will never retry a query for you. You need to explicitly tell it to retry a query. The reason is because the correct behavior is hard to define. Should a query retry with the exact same parameters? Or should something be modified? Is it still safe to retry the query? It's much safer for the database to simply throw an exception and let you handle it.
Let's address your example. Assuming you use transactions correctly and do the right query (Martin Smith linked to a few good solutions), then the database will create the right locks so that the race condition disappears. One user will succeed, and the other will fail. In this case, there is no blocking, and no retrying.
In the general case with transactions, however, there will be blocking, and you get to implement the retrying.

Batch Job Transaction

Lets say there are 100 records in the database for a batch job, when batch job runs and pick up those 100 record and then start processing. During process if error occurs at 10th record then should I rollback all 9 records which are already been processed.
How can we design this scenario??? Your suggestions are welcomed.
I believe you're asking if you should roll back successful records if an error occurs part-way through batch processing.
You want your DB updates to occur in transactions in a way that leaves database records consistent and legal (with respect to DB and business rules) after each transaction is committed or rolled back.
If each item in your list of 100 records can be processed and recorded individually, then I'd suggest using a flag of some sort (this could be a status field, as well) to indicate whether each record has been processed, then loop through the records to update each one. If you encounter an error, note that somewhere (log file, exception, error table... your call) and move on. When you're done, you'll have logged which records were successful and which failed. You should then be able to go back and fix whatever caused the problem(s) on bad records and re-process the skipped records.
If all 100 of your records must succeed or fail together, then you'll need to wrap your updates in a transaction so they all succeed or fail as one. This will work for dozens of records or (maybe) hundreds of records, but trying to scale up to thousands of records in the same transaction could create scalability problems (performance and contention issues), so you'd want a different solution for a pattern like that.
There is different transaction granularity possible. Java (JTA) allows for multiple writes in a single commit.
Open transaction
write record
write record
write record
error
roll-back
Most databases also support transactions that can handle multiple rows or record writes.
This is very common.
Take a look at SAVEPOINTS - they basically allow inner-transaction transactions. So you can do quite a lot of work, make a SAVEPOINT, and do more work, only rolling back to the last save point. If things go really wrong, you can just roll the entire transaction back.

Are Transactions Always Atomic?

I'm trying to better understand a nuance of SQL Server transactions.
Say I have a query that updates 1,000 existing rows, updating one of the columns to have the values 1 through 1,000. It's possible to execute this query and, when completed, those rows would not be numbered sequentially. This is because it's possible for another query to modify one of those rows before my query finishes.
On the other hand, if I wrap those updates in a transaction, that guarantees that if any one update fails, I can fail all updates. But does it also mean that those rows would be guaranteed to be sequential when I'm done?
In other words, are transactions always atomic?
But does it also mean that those rows would be guaranteed to be sequential when I'm done?
No. This has nothing to do with transactions, because what you're asking for simply doesn't exists: relational tables have no order an asking for 'sequential rows' is the wrong question to ask. You can rephrase the question as 'will the 1000 updated rows contain the entire sequence from 1 to 1000, w/o gaps' ? Most likely yes, but the truth of the matter is that there could be gaps depending on the way you do the updates. Those gaps would not appear because updated rows are modified after the update before commit, but because the update will be a no-op (will not update any row) which is a common problem of read-modify-write back type of updates ( the row 'vanishes' between the read and the write-back due to concurrent operations).
To answer your question more precisely whether your code is correct or not you have to post the exact code you're doing the update with, as well as the exact table structure, including all indexes.
Atomic means the operation(s) within the transaction with either occur, or they don't.
If one of the 1,000 statements fails, none of the operations within the transaction will commit. The smaller the sample of statements within a transaction -- say 100 -- means that the blocks of 100 leading up to the error (say at the 501st) can be committed (the first 400; the 500 block won't, and the 600+ blocks will).
But does it also mean that those rows would be guaranteed to be sequential when I'm done?
You'll have to provide more context about what you're doing in a transaction to be "sequential".
The 2 points are unrelated
Sequential
If you insert values 1 to 1000, it will be sequential with an WHERE and ORDER BY to limit you to these 1000 rows in some column. Unless there are duplicates, so you'd need a unique constraint
If you rely on an IDENTITY, it isn't guaranteed: Do Inserted Records Always Receive Contiguous Identity Values.
Atomicity
All transactions are atomic:
Is neccessary to encapsulate a single merge statement (with insert, delete and update) in a transaction?
SQL Server and connection loss in the middle of a transaction
Does it delete partially if execute a delete statement without transaction?
SQL transactions, like transactions on all database platforms, put the data in isolation to cover the entire ACID acronym (atomic, consistent, isolated and durable). So the answer is yes.
A transaction guarantees atomicity. That is the point.
You problem is that after you do the insert, they are only "Sequential" until the next thing comes along and touches one of the new records.
If another step in you process requires them to still be sequential then that step, too, needs to be within your original transaction.

race condition UPDATE modification of credit column - what happens on rollback?

ok, I tried searching and have not found an answer to this - I am curious how the ROLLBACK handles race conditions. For example:
If I have a table (CompanyAccount) which keeps track of how many credits an company has available for purchase (there is only one row in a database table per company) and there are potentially multiple users from the same company who can decrement the credits from the single company account, what happens in case of an error when a ROLLBACK occurs?
Example:
Assumptions: I have written the update properly to calculate the "Credit" new balance instead of guessing what the new credit balance is (i.e. we don't try to tell the UPDATE statement what the new Credit balance/value is, we say take whatever is in the credit column and subtract my decrement value in the UPDATE statement)...
here is an example of how the update statement is written:
UPDATE dbo.CompanyAccount
SET Credit = Credit - #DecrementAmount
WHERE CompanyAccountId = #CompanyAccountId
If the "Credit" column has 10,000 credits. User A causes a decrement of 4,000 credits and User B causes a decrement of 1000 credits. For some reason a rollback is triggered during User A's decrement (there are about a 1/2 dozen more tables with rows getting INSERTED during the TRANSACTION). If User A wins the race condition and the new balance is 6,000 (but not yet COMMIT'ed) what happens if User B's decrement occurs before the rollback is applied? does the balance column go from 6,000 to 5,000 and then gets ROLLBACK to 10,000?
I am not too clear on how the ROLLBACK will handle this. Perhaps I am over-simplifying. Can someone please tell me if I misunderstand how ROLLBACK will work or if there are other risks I need to worry about for this style.
Thanks for your input.
In the example you have given there will be no problem.
The first transaction will have an exclusive lock meaning the second one can not modify that row until after the first one has committed or rolled back. It will just have to wait (blocked) until the lock is released.
It gets a bit more complicated if you have multiple statements. You should probably read up on different isolation levels and how they can allow or prevent such phenomena as "lost updates".
Rollback is part of the transaction and locks will be maintained during the rollback. The *A*tomic in ACID.
User B will not start until all locks are released.
What happens:
User A locks rows
User B won't see the rows until locks are released
User A rolls back, release locks, changes never happened.
User B sees the rows. -1000 will result in 9000
However, if User B has already read the balance then it my be inconsistent when it comes to UPDATE. It depends on what you're actually doing and in what order, hence the need to understand isolation levels (and the issues with phantom and non-repeatable reads)
An alternative to SERIALIZABLE or REPEATABLE READ may to use sp_getapplock in transaction mode to semaphore parts of the transaction.

Postgresql Concurrency

In a project that I'm working, there's a table with a "on update" trigger, that monitors if a boolean column has changed (ex.: false -> true = do some action). But this action can only be done once for a row.
There will be multiple clients accessing the database, so I can suppose that eventually, multiple clients will try to update the same row column in parallel.
Does the "update" trigger itself handle the concurrency itself, or I need to do it in a transaction and manually lock the table?
Triggers don't handle concurrency, and PostgreSQL should do the right thing whether or not you use explicit transactions.
PostgreSQL uses optimistic locking which means the first person to actually update the row gets a lock on that row. If a second person tries to update the row, their update statement waits to see if the first commits their change or rolls back.
If the first person commits, the second person gets an error, rather than their change going through and obliterating a change that might have been interesting to them.
If the first person rolls back, the second person's update un-blocks, and goes through normally, because now it's not going to overwrite anything.
The second person can also use the NOWAIT option, which makes the error happen immediately instead of blocking, if their update conflicts with an unresolved change.