Concurrent insert of keys into a table - sql

Probably a trivial question, but I want to get the best possible solution.
Problem:
I have two or more workers that insert keys into one or more tables. The problem arises when two or more workers try to insert the same key into one of those key tables at the same time.
Typical problem.
Worker A reads the table if a key exists (SELECT). There is no key.
Worker B reads the table if a key exists (SELECT). There is no key.
Worker A inserts the key.
Worker B inserts the key.
Worker A commits.
Worker B commits. Exception is throws as unique constraint is violated
The key tables are simple pairs. First column is autoincrement integer and the second is varchar key.
What is the best solution to such a concurrency problem? I believe it is a common problem. One way for sure is to handle the exceptions thrown, but somehow I don't believe this is the best way to tackle this.
The database I use is Firebird 2.5
EDIT:
Some additional info to make things clear.
Client side synchronization is not a good approach, because the inserts come from different processes (workers). And I could have workers across different machines someday, so even mutexes are a no-go.
The primary key and the first columns of such a table is autoincrement field. No problem there. The varchar field is the problem as it is something that the client inserts.
Typical such table is a table of users. For instance:
1 2056
2 1044
3 1896
4 5966
...
Each worker check if user "xxxx" exists and if not inserts it.
EDIT 2:
Just for the reference if somebody will go the same route. IB/FB return pair of error codes (I am using InterBase Express components). Checking for duplicate value violation look like this:
except
on E: EIBInterBaseError do
begin
if (E.SQLCode = -803) and (E.IBErrorCode = 335544349) then
begin
FKeysConnection.IBT.Rollback;
EnteredKeys := False;
end;
end;
end;

With Firebird you can use the following statement:
UPDATE OR INSERT INTO MY_TABLE (MY_KEY) VALUES (:MY_KEY) MATCHING (MY_KEY) RETURNING MY_ID
assuming there is a BEFORE INSERT trigger which will generate the MY_ID if a NULL value is being inserted.
Here is the documentation.
Update: The above statement will avoid exceptions and cause every statement to succeed. However, in case of many duplicate key values it will also cause many unnecessary updates.
This can be avoided by another approach: just handle the unique constraint exception on the client and ignore it. The details depend on which Delphi library you're using to work with Firebird but it should be possible to examine the SQLCode returned by the server and ignore only the specific case of unique constraint violation.

I do not know if something like this is avalible in Firebird but in SQL Server you can check when inserting the key.
insert into Table1 (KeyValue)
select 'NewKey'
where not exists (select *
from Table1
where KeyValue = 'NewKey')

First option - don't do it.
Don't do it; Unless the WORKERS are doing extraordinary amounts of work (we're talking about computers, so requiring 1 second per record qualifies as "extraordinary amount of work"), just use a single thread; Even better, do all the work in a stored procedure, you'd be amazed by the speedup gained by not transporting data over whatever protocol into your app.
Second option - Use a Queue
Make sure your worker threads don't all work on the same ID. Set up a Queue, push all the ID's that need processing into that queue, have each working thread Dequeue an ID from that Queue. This way you're guaranteed no two workers work on the same record at the same time. This might be difficult to implement if your workers are not all part of the same process.
Last resort
Set up an DB-based "Reservation" system so an Worker Thread can mark a Key for "work in process" so no two workers would work on the same Key. I'd set up a table like this:
CREATE TABLE KEY_RESERVATIONS (
KEY INTEGER NOT NULL, /* This is the KEY you'd be reserving */
RESERVED_UNTIL TIMESTAMP NOT NULL /* We don't want to keep reservations for ever in case of failure */
);
Each of your workers would use short transactions to work on that table: Select a candidate Key, one that's not in the KEY_RESERVATIONS table. Try to INSERT. Failed? Try an other KEY. Periodically delete all reserved key with old RESERVED_UNTIL timestamps. Make sure the transactions for working with KEY_RESERVATIONS are as short as possible, so that two threads both trying to reserve the same key at the same time would fail quickly.

This is what you have to deal with in an optimistic (or no-) locking scheme.
One way to avoid it is to put a pessimistic lock on the table around the whole select, insert, commit sequence.
However, that means you will have to deal with not being able to access the table (handle table-locked exceptions).
If by workers you mean threads in the same application instance instead of different users (application instances), you will need thread synchronization like kubal5003 says around the select-insert-commit sequence.
A combination of the two is needed if you have multiple users/application instances each with multiple threads.

Synchronize your threads to make it impossible to insert the same value or use a db side key generation method (I don't know Firebird so I don't even know if it's there, eg. in MsSQL Server there is identity column or GUIDs also solve the problem because it's unlikely to generate two identical ones)

You should not rely the client to generate the unique key, if there's possibility for duplicates.
Use triggers and generators (maybe with help of stored procedure) to create always unique keys.
More information about proper autoinc implementation in Firebird here: http://www.firebirdfaq.org/faq29/

Related

SQLite: Autoincrement and insert or ignore will produce unused Autoincrement Keys

I'm using a table Mail with auto-increment Id and Mail Address. The table is used in 4 other tables and it is mainly used to save storage (String is only saved once and not 4 times). I'm using INSERT OR IGNORE to just blindly add the mail addresses to the table and if it exists ignore the update. This approach is MUCH faster than checking the existence with SELECT ... and do an INSERT if needed.
For every INSERT OR IGNORE the auto-increment, no matter if ignored or done the auto-increment Id is incremented. I one run I have approx. 500k data sets to proceed. So after every run the the last auto-increment key is incremented by 500k. I know there are 2^63-1 possible keys, so a long time to use them all up.
I also tried INSERT OR REPLACE, but this will increment the Id of the dataset on every run of the command, so this is not a solution at all.
Is there a way to prevent this increase of auto-increment key on every INSERT OR IGNORE?
Table Mail Example (replaced with pseudo Addresses)
mIdMail mMail
"1" ""
"7" "mail1#example.com"
"15" "mail2#example.com"
"17" "mail3#example.com"
"19" "mail4#example.com"
"23" "mail5#example.com"
...
Insert Query (Using Java Lib: org.apache.commons.dbutils)
INSERT OR IGNORE
INTO MAIL
( mMail )
VALUES ( ? );
Table Definition
CREATE TABLE IF NOT EXISTS MAIL (
mIdMail INTEGER PRIMARY KEY AUTOINCREMENT,
mMail CHAR(90) UNIQUE
);
To get autoincrementing values without gaps, drop the AUTOINCREMENT keyword. (Yes, you get autoincrementing values even without it.)
Auto-increment keys behave the way they do specifically because the database guarantees their behavior -- regardless of concurrent transactions and transaction failures.
Auto-increment keys have two guarantees:
They are increasing, so later inserts have larger values than earlier ones.
They are guaranteed to be unique.
The mechanism for allocating the keys does not guarantee no gaps. Why not? Because no-gaps would incur a lot more overhead on the database. Basically, each transaction on the table would need to be completely serialized (that is completed and committed) before the next one can take place. Generally, that is a really bad idea from a performance perspective.
Unfortunately, SQLite doesn't have the simplest solution, which is simply to call row_number() on the auto-incremented keys. You could try to implement a gapless auto-increment using triggers, significantly slowing down your application.
My real suggestion is simply to live with the gaps. Accept them. Surrender. That is how the built-in method works, and for good reason. Now design the rest of the database/application keeping this in mind.
I had the same issue, and changing "INSERT OR IGNORE" into "INSERT OR FAIL" solved the problem, so now when it fails the id value doesn't increment.

Locking table in postgresql

I have a table named as 'games', which contains a column named as 'title', this column is unique, database used in PostgreSQL
I have a user input form that allows him to insert a new 'game' in 'games' table. The function that insert a new game checks if a previously entered 'game' with the same 'title' already exists, for this, I get the count of rows, with the same game 'title'.
I use transactions for this, the insert function at the start uses BEGIN, gets the row count, if row count is 0, inserts the new row and after process is completed, it COMMITS the changes.
The problem is that, there are chances that 2 games with the same title if submitted by the user at the same time, would be inserted twice, since I just get the count of rows to chk for duplicate records, and each of the transaction would be isolated from each other
I thought of locking the tables when getting the row count as:
LOCK TABLE games IN ACCESS EXCLUSIVE MODE;
SELECT count(id) FROM games WHERE games.title = 'new_game_title'
Which would lock the table for reading too (which means the other transaction would have to wait, until the current one is completed successfully). This would solve the problem, which is what I suspect. Is there a better way around this (avoiding duplicate games with the same title)
You should NOT need to lock your tables in this situation.
Instead, you can use one of the following approaches:
Define UNIQUE index for column that really must be unique. In this case, first transaction will succeed, and second will error out.
Define AFTER INSERT OR UPDATE OR DELETE trigger that will check your condition, and if it does not hold, it should RAISE error, which will abort offending transaction
In all these cases, your client code should be ready to properly handle possible failures (like failed transactions) that could be returned by executing your statements.
Using the highest transaction isolation(Serializable) you can achieve something similar to your actual question. But be aware that this may fail ERROR: could not serialize access due to concurrent update
I do not agree with the constraint approach entirely. You should have a constraint to protect data integrity, but relying on the constraint forces you to identify not only what error occurred, but which constraint caused the error. The trouble is not catching the error as some have discussed but identifying what caused the error and providing a human readable reason for the failure. Depending on which language your application is written in, this can be next to impossible. eg: telling the user "Game title [foo] already exists" instead of "game must have a price" for a separate constraint.
There is a single statement alternative to your two stage approach:
INSERT INTO games ( [column1], ... )
SELECT [value1], ...
WHERE NOT EXISTS ( SELECT x FROM games as g2 WHERE games.title = g2.title );
I want to be clear with this... this is not an alternative to having a unique constraint (which requires extra data for the index). You must have one to protect your data from corruption.

SELECT ... FOR UPDATE and conflicting inserts

I want to insert a row, but if a conflict occurs (example below) I'd like the database to lock the existing row so I can log its contents for debugging purposes. I am using READ_COMMITTED transaction isolation.
For example:
CREATE TABLE users(id BIGINT AUTO_INCREMENT, name VARCHAR(30),
count INT NOT NULL, PRIMARY KEY(id), UNIQUE(name));
Thread 1:
INSERT INTO users(username, count) VALUES('joe', 1000);
transaction.commit();
Thread 2:
// Insert fails due to conflict with above record
INSERT INTO users(username, count) VALUES('joe', 0);
// Get the conflicting row and log its properties
SELECT * FROM users WHERE username = 'joe';
If the conflicting row is not locked, it may be modified by the time I check it. The only workaround I found is invoking SELECT id FROM users WHERE username = 'joe' FOR UPDATE before the insert. Is it possible way to implement this without any overhead when a conflict does not occur?
UPDATE: I am not asking to avoid the conflict or the resulting SQLException. I am just asking for the conflicting row to get locked so I can look up what values triggered the conflict. Yes, I know that the conflicting record contains joe but I want to log all its other columns.
No it is not possible to eliminate the confict of a UNIQUE column
when using INSERT of rows with unique column(s).
Trying to write SQL that never has to deal with SQL Exceptions
is just wasted effort that always ends up creating SQL that fails
under some conditions.
Exception handling can't be avoided when dealing with real time
multi-threaded multi-user database servers, unless you
can afford to lock the table, do the update, and unlock the
table (which will create terrible performance when under
heavy load of many users)
The UNIQUE CONSTRAINT VIOLATION Exception will ALWAYS occur on the 2nd INSERT,
as the two INSERTs in your example could be widely separated in time
(e.g. by hours, days or weeks); Table or row locking won't change this.
This problem is one that should be solved at the GUI level anyway
as choosing a "user name" that may already be chosen by a previous
user, requires providing the "new" user with feedback like
"Sorry, that user name is already in use by another user", so
it would seen unlikely that handling the UNIQUE VIOLATION exception
can or should ever be "avoided".
In addition, there is no reason to SELECT ... FOR UPDATE, since
all you need to do is SELECT id WHERE name = newName and see if
you get a resulting id or null; (id == null) => user name not in use,
but even then two user could try to both get the "not in use" result
at the same time and one of the INSERTs could still fail.
When the UNIQUE exception is returned on the duplicate INSERT,
the second INSERT has failed and that record was not created,
so there is no "duplicate" record to lock and then read after
the UNIQUE exception is returned on the failed INSERT.
Wich version of SQL are you usign? I'm not sure if I understand correctly your question but I think you could do this in a trigger.
In the trigger, you can view the inserted value (your conflicting row) and log it, and make a rollback. Wich means that when you insert your row, when a conflict does not occur, you don't have to commit anything, and when a conflict occurs, the log is made and the row is not inserted.
No, most databases do not support that kind of operation.
You can do tricks like creating an explicit transaction
BEGIN TRANSACTION
IF EXIST(SELECT ...)
ROLLBACK
INSERT INTO...
COMMIT
But that isn't exactly what you want. The only to achieve what you're asking for is to use one of the B-TREE style libraries which are a lot more low-level.
There doesn't seem to be a portable way of doing this and looking at MVCC there is a strong indication that this cannot be implemented without a substantial performance impact.
So in conclusion: you're going to have to settle for knowing that a conflict occurred but have no way of being 100% sure of the cause (there is no thread-safe to verify).

How can I get the Primary Key id of a file I just INSERTED?

Earlier today I asked this question which arose from A- My poor planning and B- My complete disregard for the practice of normalizing databases. I spent the last 8 hours reading about normalizing databases and the finer points of JOIN and worked my way through the SQLZoo.com tutorials.
I am enlightened. I understand the purpose of database normalization and how it can suit me. Except that I'm not entirely sure how to execute that vision from a procedural standpoint.
Here's my old vision: 1 table called "files" that held, let's say, a file id and a file url and appropos grade levels for that file.
New vision!: 1 table for "files", 1 table for "grades", and a junction table to mediate.
But that's not my problem. This is a really basic Q that I'm sure has an obvious answer- When I create a record in "files", it gets assigned the incremented primary key automatically (file_id). However, from now on I'm going to need to write that file_id to the other tables as well. Because I don't assign that id manually, how do I know what it is?
If I upload text.doc and it gets file_id 123, how do I know it got 123 in order to write it to "grades" and the junction table? I can't do a max(file_id) because if you have concurrent users, you might nab a different id. I just don't know how to get the file_id value without having manually assigned it.
You may want to use LAST_INSERT_ID() as in the following example:
START TRANSACTION;
INSERT INTO files (file_id, url) VALUES (NULL, 'text.doc');
INSERT INTO grades (file_id, grade) VALUES (LAST_INSERT_ID(), 'some-grade');
COMMIT;
The transaction ensures that the operation remains atomic: This guarantees that either both inserts complete successfully or none at all. This is optional, but it is recommended in order to maintain the integrity of the data.
For LAST_INSERT_ID(), the most
recently generated ID is maintained in
the server on a per-connection basis.
It is not changed by another client.
It is not even changed if you update
another AUTO_INCREMENT column with a
nonmagic value (that is, a value that
is not NULL and not 0).
Using
LAST_INSERT_ID() and AUTO_INCREMENT
columns simultaneously from multiple
clients is perfectly valid. Each
client will receive the last inserted
ID for the last statement that client
executed.
Source and further reading:
MySQL Reference: How to Get the Unique ID for the Last Inserted Row
MySQL Reference: START TRANSACTION, COMMIT, and ROLLBACK Syntax
In PHP to get the automatically generated ID of a MySQL record, use mysqli->insert_id property of your mysqli object.
How are you going to find the entry tomorrow, after your program has forgotten the value of last_insert_id()?
Using a surrogate key is fine, but your table still represents an entity, and you should be able to answer the question: what measurable properties define this particular entity? The set of these properties are the natural key of your table, and even if you use surrogate keys, such a natural key should always exist and you should use it to retrieve information from the table. Use the surrogate key to enforce referential integrity, for indexing purpuses and to make joins easier on the eye. But don't let them escape from the database

Does this query guarantee me a 'race free' PK value?

I was just reading How to avoid a database race condition when manually incrementing PK of new row.
There was a lot of good suggestions like having a separate table to get the PK values.
So I wonder if a query like this:
INSERT INTO Party VALUES(
(SELECT MAX(id)+1 FROM
(SELECT id FROM Party) as x),
'A-XXXXXXXX-X','Joseph')
could avoid race conditions?
Is the whole statement guaranteed to be atomic? Isn't in mysql? postgresql?
The best way to avoid race conditions while creating primary keys in a relational database is to allow the database to generate the primary keys.
It would work on tables which use table-level locking (MyISAM), but on Innodb etc, it could deadlock or produce duplicate keys, I think, depending on the isolation level in use.
In any case doing this is an extremely bad idea as it won't work well in the general case, but might appear to work during low-concurrency testing. It's a recipe for trouble.
You'd be better off using another table and incrementing a value in there; that's more likely to be race-free / deadlock-free.
No, you still have a problem, as, if two queries try to increment at the same time there may be a situation where the inner select is done, then another query is processed.
Your best bet, if you want a guarantee, if you don't want the database doing it, is to have a unique key on there.
In the event that there is an error in inserting, then try your query again, and once the primary key is unique it will work.
In this case, your best bet is to first insert only the id and any other non-null columns, and then do an update to set the nullable columns to whatever is correct.