How do you handle stale data with multiple threads?

How do you handle stale data with multiple threads? - sql

Let's say I have the following psuedocode:
SELECT count(*) FROM users WHERE email = 'bob#gmail.com'
>>>> MARKER A
if (count > 0) return;
else INSERT INTO users VALUES ('bob#gmail.com')
So essentially only insert the email if it doesn't exist already. I understand there's probably some sort of INSERT IF NOT EXISTS query I could use, but let's say we use this example.
So if the code above runs on thread A, and thread B actually inserts 'bob#gmail.com' into users at MARKER A, then thread A has "stale data" and will try to insert 'bob#gmail.com', thinking the count is still 0, but in fact it is now 1. This will error out since we have a unique index on the email.
What is the tool I should use to prevent this issue? From my reading about transactions, they basically make a set of operations atomic, so the code above will execute completely or not at all. It will NOT ensure the users table is locked against updates correct? So I can't just wrap the code above in a transaction and make it thread-safe?
Should I implement application-level locking? Should I ensure that when this operation occurs, it must acquire the lock to access the users table so that no other thread can make changes to it? I feel that locking the entire table is a performance hit I want to avoid.

Checking before inserting is a known anti-pattern on multi-threaded applications. Do not even try it.
The right way of doing it is letting the database take care of it. Add a UNIQUE constraint on the column, as in:
alter table users add constraint uq1 unique(email);
Just try to insert the row in the database. If it succeeds, all is good; if it fails, then some other thread has alreay inserted the row.
Alternatively, you could issue a LOCK on the whole table. That would also work, but the performance of your application would become horrible.

Related

Prevent other sessions from reading data until I'm finished

Prevent other sessions from reading data until I'm finished
I have a table that holds customers from different companies, something like:
CUSTOMER
CUSTOMER_ID
COMPANY_ID
CUSTOMER_NAME
FOO_CODE
When I insert or update a customer I need to calculate a FOO_CODE based on existing ones (within the company).
If I simply do this:
SELECT MAX(FOO_CODE) AS GREATEST_CODE_SO_FAR
FROM CUSTOMER
WHERE COMPANY_ID=:company_id
... then generate the code in the client language (PHP) and finally issue the INSERT/UPDATE I understand I can face a race condition if other program instance fetches the same GREATEST_CODE_SO_FAR.
Is it possible to issue a row-level lock on the table so other sessions that attempt to read the FOO_CODE column of any customer that belongs to a given company are delayed until I commit or rollback my transaction?
My failed attemps:
This:
SELECT MAX(FOO_CODE)
FROM CUSTOMER
WHERE COMPANY_ID=:company_id
FOR UPDATE
... triggers:
ORA-01786: FOR UPDATE of this query expression is not allowed
This:
SELECT FOO_CODE
FROM CUSTOMER
WHERE COMPANY_ID=:company_id
FOR UPDATE
... retrieves all company rows and does not even prevent other sessions from reading data.
LOCK TABLE... well, documentation barely has any example and I can't figure out the syntax
P.S. Is it not an incrementing number, it's an alphanumeric string.

You can't block another session from reading data, as far as I'm aware. One of the differences between Oracle and some other databases is that writers don't block readers.
I'd probably look at this slightly differently. I'm assuming the way you generate the next foo_code is deterministic. If you add a unique index on company_id, foo_code then you can have your application attempt the insert in a loop:
get your current max value
calculate your new code
do the insert
if you don't get a constraint violation, break out of the loop
otherwise continue to the next iteration of the loop and repeat the process
If two sessions attempt this at the same time then the second one will attempt to insert the same foo_code and will get a unique constraint violation. That is trapped and handled nicely and it just tries again; potentially multiple times until it gets a clean insert.
You could have a DB procedure that attempts the insert in a loop, but since you want to generate the new value in PHP then it would make sense for the loop to be in PHP too, attempting a simple insert.
This doesn't necessarily scale well if you have a high volume of inserts and clashes are likely. But if you're expecting simultaneous inserts for the same customer to be rare and just have to handle the odd occasions when it does happen this won't add much overhead.

Rails ActiveRecord - how can I lock a table for reading?

I have some Rails ActiveRecord code that looks like this:
new_account_number = Model.maximum(:account_number)
# Some processing that usually involves incrementing
# the new account number by one.
Model.create(foo: 12, bar: 34, account_number: new_account_number)
This code works fine on its own, but I have some background jobs that are processed by DelayedJob workers. There are two workers and if they both start processing a batch of jobs that deal with this code, they end up creating new Model records that has the same account_number, because of the delay between finding the maximum and creating a new record with an even higher account number.
For now, I have solved it by adding a uniqueness constraint at database level to the models table and then retry by re-selecting the maximum in case this constraint triggers an exception.
However it feels like a hack.
Adding auto incrementing at database level to the account_number column is not an option, because the account_number assigning entails more than just incrementing.
Ideally I would like to lock the table in question for reading, so no other can execute the maximum select query against the table until I am done. However, I'm not sure how to go about that. I'm using Postgresql.

Based on the ActiveRecord::Locking docs it looks like Rails doesn't provide a built-in API for table-level locks.
But you can still do this with raw SQL. For Postgres, this looks like
ActiveRecord::Base.transaction do
ActiveRecord::Base.connection.execute('LOCK table_name IN ACCESS EXCLUSIVE MODE')
...
end
The lock must be acquired within a transaction, and is automatically freed once the transaction ends.
Note that the SQL you use here will be different depending on your database.
Obviously locking the entire table is not elegant or efficient, but for small apps, for some time, it may indeed be the best solution. It's simple and easy to reason about. In general, an advisory lock is a better fit for this kind of data race.

There are already answers on how to lock the entire table, but I believe you should try to avoid that. Instead I believe you should give advisory locks a look. It makes sure the same block of code isn't executed on two machines simultaneously, while still keeping the table open for other business.
It still uses the database, but it doesn't lock your tables.
You can use the gem called "with_advisory_lock" like this:
Model.with_advisory_lock("ADVISORY_LOCK_NAME") do
# Your code
end
https://github.com/ClosureTree/with_advisory_lock
It doesn't work with SQLite.

Setting unique constraint IS NOT a hack. It is thing that makes your data consistent.
By the way you have a few more options here:
Lock some DB resource (e.g. it could be a unique record) using
SELECT FOR UPDATE or PostreSQL's Advisory Locks (see docs).
Use a sequence (docs).
The main difference between two approaches is #1 does not allow to have gaps in your numbers because other session will wait for transaction commit and #2 allows.

you don't have to lock the hall table to lock a piece of code for a single process at a time. locking a full table causes performence problems.you can lock a single same row all the time with "with_lock" method.this way code is fully protected. no extra gem is needed. it also creates a transaction. like this:
m = Model.order(:id).first
m.with_lock do #aquire lock
#some code here for a single process at a time
end #release lock

Well, technically it's the same to lock a table or to always lock a record of another table before accessing the table.
So you may have another table with max one record, alway lock that record with http://api.rubyonrails.org/classes/ActiveRecord/Locking/Pessimistic.html before read/write from the table you want to lock:
LockTable.last.with_lock do
// the things that needed for your table
end

SQL unique field: concurrency bugs? [duplicate]

This question already has answers here:
Only inserting a row if it's not already there
(7 answers)
Closed 9 years ago.
I have a DB table with a field that must be unique. Let's say the table is called "Table1" and the unique field is called "Field1".
I plan on implementing this by performing a SELECT to see if any Table1 records exist where Field1 = #valueForField1, and only updating or inserting if no such records exist.
The problem is, how do I know there isn't a race condition here? If two users both click Save on the form that writes to Table1 (at almost the exact same time), and they have identical values for Field1, isn't it possible that the following would happen?
User1 makes a SQL call, which performs the select operation and determines there are no existing records where Field1 = #valueForField1. User1's process is preempted by User2's process, which also finds no records where Field1 = #valueForField1, and performs an insert. User1's process is allowed to run again, and inserts a second record where Field1 = #valueForField1, violating the requirement that Field1 be unique.
How can I prevent this? I'm told that transactions are atomic, but then why do we need table locks too? I've never used a lock before and I don't know whether or not I need one in this case. What happens if a process tries to write to a locked table? Will it block and try again?
I'm using MS SQL 2008R2.

Add a unique constraint on the field. That way you won't have to SELECT. You will only have to insert. The first user will succeed the second will fail.
On top of that you may make the field autoincremented, so you won't have to care on filling it, or you may add a default value, again not caring on filling it.
Some options would be an autoincremented INT field, or a unique identifier.

You can add a add a unique constraint. Example from http://www.w3schools.com/sql/sql_unique.asp:
CREATE TABLE Persons
(
P_Id int NOT NULL UNIQUE
)

EDIT: Please also read Martin Smith's comment below.
jyparask has a good answer on how you can tackle this specific problem. However, I would like to elaborate on your confusion over locks, transactions, blocking, and retries. For the sake of simplicity, I'm going to assume transaction isolation level serializable.
Transactions are atomic. The database guarantees that if you have two transactions, then all operations in one transaction occur completely before the next one starts, no matter what kind of race conditions there are. Even if two users access the same row at the same time (multiple cores), there is no chance of a race condition, because the database will ensure that one of them will fail.
How does the database do this? With locks. When you select a row, SQL Server will lock the row, so that all other clients will block when requesting that row. Block means that their query is paused until that row is unlocked.
The database actually has a couple of things it can lock. It can lock the row, or the table, or somewhere in between. The database decides what it thinks is best, and it's usually pretty good at it.
There is never any retrying. The database will never retry a query for you. You need to explicitly tell it to retry a query. The reason is because the correct behavior is hard to define. Should a query retry with the exact same parameters? Or should something be modified? Is it still safe to retry the query? It's much safer for the database to simply throw an exception and let you handle it.
Let's address your example. Assuming you use transactions correctly and do the right query (Martin Smith linked to a few good solutions), then the database will create the right locks so that the race condition disappears. One user will succeed, and the other will fail. In this case, there is no blocking, and no retrying.
In the general case with transactions, however, there will be blocking, and you get to implement the retrying.

SQL Server 2008 Express locking

OK so I have read a fair amount about SQL Server's locking stuff, but I'm struggling to understand it all.
What I want to achieve is thus:
I need to be able to lock a row when user A SELECTs it
If user B then tries to SELECT it, my winforms .net app needs to set all the controls on the relevant form to be disabled, so the user can't try and update. Also it would be nice if I could throw up a messagebox for user B, stating that user A is the person that is using that row.
So basically User B needs to be able to SELECT the data, but when they do so, they should also get a) whether the record is locked and b) who has it locked.
I know people are gonna say I should just let SQL Server deal with the locking, but I need User B to know that the record is in use as soon as they SELECT it, rather than finding out when they UPDATE - by which time they may have entered data into the form, giving me inconsistency.
Also any locks need to allow SELECTs to still happen - so when user B does his SELECT, rather than just being thrown an exception and receiving no/incomplete data, he should still get the data, and be able to view it, but just not be able to update it.
I'm guessing this is pretty basic stuff, but there's so much terminology involved with SQL Server's locking that I'm not familiar with that it makes reading about it pretty difficult at the moment.
Thanks

To create this type of 'application lock', you may want to use a table called Locks and insert key, userid, and table names into it.
When your select comes along, join into the Locks table and use the presence of this value to indicate the record is locked.
I would also recommend adding a 'RowVersion' column to your table you wish to protect. This field will assist in identifying if you are updating or querying a row that has changed since you last selected it.

This isn't really what SQL Server locking is for - ideally you should only be keeping a transaction (and therefore a lock) open for the absolute minimum needed to complete an atomic operation against that database - you certainly shouldn't be holding locks while waiting for user input.
You would be better served keeping track of these sorts of locks yourself by (for example) adding a locked bit column to the table in question along with a locked_by varchar column to keep track of who has the row locked.
The first user should UPDATE the row to indicate that the row is locked and who has it locked:
UPDATE MyTable
SET `locked` = 1
AND `locked_by` = #me
WHERE `locked` = 0
The locked = 0 check is there to protect against potential race conditions and make sure that you don't update a record that someone else has already locked.
This first user then does a SELECT to return the data and ensure that they did really manage to lock the row.

Getting deadlocks in MySQL

We're very frustratingly getting deadlocks in MySQL. It isn't because of exceeding a lock timeout as the deadlocks happen instantly when they do happen. Here's the SQL code that is executing on 2 separate threads (with 2 separate connections from the connection pool) that produces a deadlock:
UPDATE Sequences SET Counter = LAST_INSERT_ID(Counter + 1) WHERE Sequence IS NULL
Sequences table has 2 columns: Sequence and Counter
The LAST_INSERT_ID allows us to retrieve this updated counter value as per MySQL's recommendation. That works perfect for us, but we get these deadlocks! Why are we getting them and how can we avoid them??
Thanks so much for any help with this.
EDIT: this is all in a transaction (required since I'm using Hibernate) and AUTO_INCREMENT doesn't make sense here. I should've been more clear. The Sequences table holds many sequences (in our case about 100 million of them). I need to increment a counter and retrieve that value. AUTO_INCREMENT plays no role in all of this, this has nothing to do with Ids or PRIMARY KEYs.

Wrap your sql statements in a transaction. If you aren't using a transaction you will get a race condition on LAST_INSERT_ID.
But really, you should have counter fields auto_increment, so you let mysql handle this.
Your third solution is to use LOCK_TABLES, to lock the sequence table so no other process can access it concurrently. This is the probably the slowest solution unless you are using INNODB.

Deadlocks are a normal part of any transactional database, and can occur at any time. Generally, you are supposed to write your application code to handle them, as there is no surefire way to guarantee that you will never get a deadlock. That being said, there are situations that increase the likelihood of deadlocks occurring, such as the use of large transactions, and there are things you can do to mitigate their occurrence.
First thing, you should read this manual page to get a better understanding of how you can avoid them.
Second, if all you're doing is updating a counter, you should really, really, really be using an AUTO_INCREMENT column for Counter rather than relying on a "select then update" process, which as you have seen is a race condition that can produce deadlocks. Essentially, the AUTO_INCREMENT property of your table column will act as a counter for you.
Finally, I'm going to assume that you have that update statement inside a transaction, as this would produce frequent deadlocks. If you want to see it in action, try the experiment listed here. That's exactly what's happening with your code... two threads are attempting to update the same records at the same time before one of them is committed. Instant deadlock.
Your best solution is to figure out how to do it without a transaction, and AUTO_INCREMENT will let you do that.

No other SQL involved ? Seems a bit unlikely to me.
The 'where sequence is null' probably causes a full table scan, causing read locks to be acquired on every row/page/... .
This becomes a problem if (your particular engine does not use MVCC and) there were an INSERT that preceded your update within the same transaction. That INSERT would have acquired an exclusive lock on some resource (row/page/...), which will cause the acquisition of a read lock by any other thread to go waiting. So two connections can first do their insert, causing each of them to have an exclusive lock on some small portion of the table, and then they both try to do your update, requiring each of them to be able to acquire a read lock on the entire table.

I managed to do this using a MyISAM table for the sequences.
I then have a function called getNextCounter that does the following:
performs a SELECT sequence_value FROM sequences where sequence_name = 'test';
performs the update: UPDATE sequences SET sequence_value = LAST_INSERT_ID(last_retrieved_value + 1) WHERE sequence_name = 'test' and sequence_value = last retrieved value;
repeat in a loop until both queries are successful, then retrieve the last insert id.
As it is a MyISAM table it won't be part of your transaction, so the operation won't cause any deadlocks.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas