I have a table named as 'games', which contains a column named as 'title', this column is unique, database used in PostgreSQL
I have a user input form that allows him to insert a new 'game' in 'games' table. The function that insert a new game checks if a previously entered 'game' with the same 'title' already exists, for this, I get the count of rows, with the same game 'title'.
I use transactions for this, the insert function at the start uses BEGIN, gets the row count, if row count is 0, inserts the new row and after process is completed, it COMMITS the changes.
The problem is that, there are chances that 2 games with the same title if submitted by the user at the same time, would be inserted twice, since I just get the count of rows to chk for duplicate records, and each of the transaction would be isolated from each other
I thought of locking the tables when getting the row count as:
LOCK TABLE games IN ACCESS EXCLUSIVE MODE;
SELECT count(id) FROM games WHERE games.title = 'new_game_title'
Which would lock the table for reading too (which means the other transaction would have to wait, until the current one is completed successfully). This would solve the problem, which is what I suspect. Is there a better way around this (avoiding duplicate games with the same title)
You should NOT need to lock your tables in this situation.
Instead, you can use one of the following approaches:
Define UNIQUE index for column that really must be unique. In this case, first transaction will succeed, and second will error out.
Define AFTER INSERT OR UPDATE OR DELETE trigger that will check your condition, and if it does not hold, it should RAISE error, which will abort offending transaction
In all these cases, your client code should be ready to properly handle possible failures (like failed transactions) that could be returned by executing your statements.
Using the highest transaction isolation(Serializable) you can achieve something similar to your actual question. But be aware that this may fail ERROR: could not serialize access due to concurrent update
I do not agree with the constraint approach entirely. You should have a constraint to protect data integrity, but relying on the constraint forces you to identify not only what error occurred, but which constraint caused the error. The trouble is not catching the error as some have discussed but identifying what caused the error and providing a human readable reason for the failure. Depending on which language your application is written in, this can be next to impossible. eg: telling the user "Game title [foo] already exists" instead of "game must have a price" for a separate constraint.
There is a single statement alternative to your two stage approach:
INSERT INTO games ( [column1], ... )
SELECT [value1], ...
WHERE NOT EXISTS ( SELECT x FROM games as g2 WHERE games.title = g2.title );
I want to be clear with this... this is not an alternative to having a unique constraint (which requires extra data for the index). You must have one to protect your data from corruption.
Related
I have the following table called Manager:
Store | Person
ABC Ms. Elara
XYZ Mr. Saros
DEF Ms. Orion
I am learning about triggers in PostgreSQL and I am trying to write a trigger for the above table to check and disallow any insert or update of a Person in the Manager table to be a manager of more than one store.
My approach is that in the trigger function, you select tuples from the Manager table where the Person is equal to the Person in the tuple you are trying to insert. If there are tuples in the Manager table where the Person is equal to the Person in the new tuple to be inserted, then the row will not be inserted.
I am not sure how in the trigger function I can select all the tuples from Manager where the Person is equal to the Person to be inserted. Any insights are appreciated.
You could also use a unique index:
CREATE UNIQUE INDEX Manager_IDX ON Manager (person);
This would also work if you tried to update a person's name to a name that already exists. It would also have the benefit of making it quick to query the table by a person's name.
You did say you wanted to learn about triggers, one way to start might be to have a trigger (actually 3, an insert, update and delete trigger) on manager that automatically keeps a historical record of changes (i.e. an audit trail) in another table (e.g. manager_audit). The audit table will have the same definition as the Manager table, but with at least one extra column (e.g. opcode) that tracks what the operation was. e.g. I for insert, D for Delete, U for Update. You could also have a timestamp to record when the operation happened and other fields (e.g. the user id who did the change etc).
Having said all of that and for whatever it is worth, enforcing uniqueness on things like name might not be the best idea. It is entirely possible that "Mr. Smith" might be the name of two different people that happen to be working in a large company.
Another factor you might want to consider is performance. If you use a select based method to ensure uniqueness on a non index column/columns, you will require an "all row scan" of the table to execute that test. This will be true no matter how you implement it (e.g. trigger, function, procedure or whatever).
Put another way, lets say you have 100 managers and you want to insert a new one. A select based check will require that you scan the entire table (i.e. 100 rows) to see if the name already exists. As the table grows the scan will also grow. Over time you will encounter the phenomena of "degraded performance". As the table reaches millions or billions of rows, your "single record insert" will actually require billions of I/O's.
Indexes don't work that way. Indexes are essentially inverted sub-tables where the columns you nominate (e.g. manager.person) are the "key". As such the will only require a single I/O to perform the necessary checks - irrespective of how many records there may be in it.
disclaimer for the purists, I'm defining an I/O as a logical "record level" I/O, not a physical disk I/O or any other type of I/O. Thus one record read = 1 I/O, one record write = 1 I/O.
Why do you need a trigger for this ? U can write a postgres function that checks for the existence of a particular person before insert.
Something like this,
Create function check_person_exists(par_person_id int)
Declare var_exists_flag int;
Begin
var_exists_flag =0;p
Select 1 from table where person_id = par_person_id into var_exists_flag;
If var_exists_flag = 0
Then
Insert into ....
Else
Raise notice ' the records already exists';
Return;
End
PS: The above code will have syntax issues .
My question is probably very specific to Postgres, probably not.
A program which I cannot modify has access to Postgress via npgsql and a simple select command, all I know.
I also have access via npgsql. The table is defined as:
-- Table: public.n_data
-- DROP TABLE public.n_data;
CREATE TABLE public.n_data
(
u_id integer,
p_id integer NOT NULL,
data text,
CONSTRAINT nc PRIMARY KEY (p_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.n_data
OWNER TO postgres;
(If that info is useful anyway)
I access one single big column, read from it and write back to it.
This all works fine so far.
The Question is: how does Postgres handles it if we write at the same time.
Any Problems there?
And if Postgres does not handle that automatically, how about when I read the data, process it and in the meantime data changes, and I write back that data after I processed it---> lost data.
Its a bit tricky to test for data integrity, since this datablock is huge, and corruptions are hard to find.
I do it with c# if that means anything.
Locking (in most1) relational databases (including Postgres) is always on row level, never on column level (it's columns and rows in a relational database not "cells", "fields" or "records")
If two transactions modify the same row, the second one will have to wait until the first one commits or rolls back.
If two transactions modify different rows then they can do that without any problems as long as they don't modify columns that are part of a unique constraint or primary key to the same value.
Read access to data is never blocked in Postgres by regular DML statements. So yes while one transaction modifies data, another one will see the old data until the first transaction commits the changes ("read consistency").
To handle lost updates you can either use the serializable isolation level or make all transactions follow the pattern that they first need to obtain a lock on the row (select ... for update) and hold that until they are finished. Search for "pessimistic locking" to get more details about this pattern.
Another option is to include a "modified" timestamp in your table. When a process reads the data it also reads the modification timestamp. When it sends back the new changes it includes a where modified_at = <value obtained when reading> - if the data has changed the condition will not hold true and nothing will be updated and you need to restart your transaction. Search for "optimistic locking" to find more details about this pattern.
1 some DBMS do page locking and some escalate many row level locks to a table lock. Neither is the case in Postgres
I have the following code:
rating = user.recipe_ratings.where(:recipe_id => recipe.id).where(:delivery_id => delivery.id).first_or_create
Yet somehow we get occasional PG::Error: ERROR: duplicate key value violates unique constraint errors from this. I can't think of any reason that should happen, since the whole point of first_or_create is to prevent those.
Is this just a crazy race-condition? How can I solve this without a maddening series of begin...rescue blocks?
This seems to stem from a typical race condition for the "SELECT or INSERT" case.
Ruby seems to choose performance over safety in its implementation. Quoting "Ruby on Rails Guides":
The first_or_create method checks whether first returns nil or not. If
it does return nil, then create is called.
...
The SQL generated by this method looks like this:
SELECT * FROM clients WHERE (clients.first_name = 'Andy') LIMIT 1
BEGIN
INSERT INTO clients (created_at, first_name, locked, orders_count, updated_at)
VALUES ('2011-08-30 05:22:57', 'Andy', 0, NULL, '2011-08-30 05:22:57')
COMMIT
If that's the actual implementation (?), it seems completely open for race conditions. Another transaction can easily SELECT between the first transaction's SELECT and INSERT. And then try its own INSERT, which would raise the error you reported, since the first transaction has inserted the row in the meantime.
The time frame for a race condition could be drastically reduced with a data-modifying CTE. Even a safe version would not cost that much more. But I guess they have their reasons.
Compare this safe implementation:
Is SELECT or INSERT in a function prone to race conditions?
Rails 6 adds a new create_or_find_by method that alleviates a possible race condition, with a few drawbacks:
The underlying table must have the relevant columns defined with unique constraints.
A unique constraint violation may be triggered by only one, or at least less than all, of the given attributes. This means that the subsequent find_by! may fail to find a matching record, which will then raise an ActiveRecord::RecordNotFound exception, rather than a record with the given attributes.
While we avoid the race condition between SELECT -> INSERT from find_or_create_by, we actually have another race condition between INSERT -> SELECT, which can be triggered if a DELETE between those two statements is run by another client. But for most applications, that's a significantly less likely condition to hit.
It relies on exception handling to handle control flow, which may be marginally slower.
def create_or_find_by(attributes, &block)
transaction(requires_new: true) { create(attributes, &block) }
rescue ActiveRecord::RecordNotUnique
find_by!(attributes)
end
Using your example:
rating = user.recipe_ratings.create_or_find_by(
recipe_id: recipe.id,
delivery_id: delivery.id
)
I want to insert a row, but if a conflict occurs (example below) I'd like the database to lock the existing row so I can log its contents for debugging purposes. I am using READ_COMMITTED transaction isolation.
For example:
CREATE TABLE users(id BIGINT AUTO_INCREMENT, name VARCHAR(30),
count INT NOT NULL, PRIMARY KEY(id), UNIQUE(name));
Thread 1:
INSERT INTO users(username, count) VALUES('joe', 1000);
transaction.commit();
Thread 2:
// Insert fails due to conflict with above record
INSERT INTO users(username, count) VALUES('joe', 0);
// Get the conflicting row and log its properties
SELECT * FROM users WHERE username = 'joe';
If the conflicting row is not locked, it may be modified by the time I check it. The only workaround I found is invoking SELECT id FROM users WHERE username = 'joe' FOR UPDATE before the insert. Is it possible way to implement this without any overhead when a conflict does not occur?
UPDATE: I am not asking to avoid the conflict or the resulting SQLException. I am just asking for the conflicting row to get locked so I can look up what values triggered the conflict. Yes, I know that the conflicting record contains joe but I want to log all its other columns.
No it is not possible to eliminate the confict of a UNIQUE column
when using INSERT of rows with unique column(s).
Trying to write SQL that never has to deal with SQL Exceptions
is just wasted effort that always ends up creating SQL that fails
under some conditions.
Exception handling can't be avoided when dealing with real time
multi-threaded multi-user database servers, unless you
can afford to lock the table, do the update, and unlock the
table (which will create terrible performance when under
heavy load of many users)
The UNIQUE CONSTRAINT VIOLATION Exception will ALWAYS occur on the 2nd INSERT,
as the two INSERTs in your example could be widely separated in time
(e.g. by hours, days or weeks); Table or row locking won't change this.
This problem is one that should be solved at the GUI level anyway
as choosing a "user name" that may already be chosen by a previous
user, requires providing the "new" user with feedback like
"Sorry, that user name is already in use by another user", so
it would seen unlikely that handling the UNIQUE VIOLATION exception
can or should ever be "avoided".
In addition, there is no reason to SELECT ... FOR UPDATE, since
all you need to do is SELECT id WHERE name = newName and see if
you get a resulting id or null; (id == null) => user name not in use,
but even then two user could try to both get the "not in use" result
at the same time and one of the INSERTs could still fail.
When the UNIQUE exception is returned on the duplicate INSERT,
the second INSERT has failed and that record was not created,
so there is no "duplicate" record to lock and then read after
the UNIQUE exception is returned on the failed INSERT.
Wich version of SQL are you usign? I'm not sure if I understand correctly your question but I think you could do this in a trigger.
In the trigger, you can view the inserted value (your conflicting row) and log it, and make a rollback. Wich means that when you insert your row, when a conflict does not occur, you don't have to commit anything, and when a conflict occurs, the log is made and the row is not inserted.
No, most databases do not support that kind of operation.
You can do tricks like creating an explicit transaction
BEGIN TRANSACTION
IF EXIST(SELECT ...)
ROLLBACK
INSERT INTO...
COMMIT
But that isn't exactly what you want. The only to achieve what you're asking for is to use one of the B-TREE style libraries which are a lot more low-level.
There doesn't seem to be a portable way of doing this and looking at MVCC there is a strong indication that this cannot be implemented without a substantial performance impact.
So in conclusion: you're going to have to settle for knowing that a conflict occurred but have no way of being 100% sure of the cause (there is no thread-safe to verify).
Probably a trivial question, but I want to get the best possible solution.
Problem:
I have two or more workers that insert keys into one or more tables. The problem arises when two or more workers try to insert the same key into one of those key tables at the same time.
Typical problem.
Worker A reads the table if a key exists (SELECT). There is no key.
Worker B reads the table if a key exists (SELECT). There is no key.
Worker A inserts the key.
Worker B inserts the key.
Worker A commits.
Worker B commits. Exception is throws as unique constraint is violated
The key tables are simple pairs. First column is autoincrement integer and the second is varchar key.
What is the best solution to such a concurrency problem? I believe it is a common problem. One way for sure is to handle the exceptions thrown, but somehow I don't believe this is the best way to tackle this.
The database I use is Firebird 2.5
EDIT:
Some additional info to make things clear.
Client side synchronization is not a good approach, because the inserts come from different processes (workers). And I could have workers across different machines someday, so even mutexes are a no-go.
The primary key and the first columns of such a table is autoincrement field. No problem there. The varchar field is the problem as it is something that the client inserts.
Typical such table is a table of users. For instance:
1 2056
2 1044
3 1896
4 5966
...
Each worker check if user "xxxx" exists and if not inserts it.
EDIT 2:
Just for the reference if somebody will go the same route. IB/FB return pair of error codes (I am using InterBase Express components). Checking for duplicate value violation look like this:
except
on E: EIBInterBaseError do
begin
if (E.SQLCode = -803) and (E.IBErrorCode = 335544349) then
begin
FKeysConnection.IBT.Rollback;
EnteredKeys := False;
end;
end;
end;
With Firebird you can use the following statement:
UPDATE OR INSERT INTO MY_TABLE (MY_KEY) VALUES (:MY_KEY) MATCHING (MY_KEY) RETURNING MY_ID
assuming there is a BEFORE INSERT trigger which will generate the MY_ID if a NULL value is being inserted.
Here is the documentation.
Update: The above statement will avoid exceptions and cause every statement to succeed. However, in case of many duplicate key values it will also cause many unnecessary updates.
This can be avoided by another approach: just handle the unique constraint exception on the client and ignore it. The details depend on which Delphi library you're using to work with Firebird but it should be possible to examine the SQLCode returned by the server and ignore only the specific case of unique constraint violation.
I do not know if something like this is avalible in Firebird but in SQL Server you can check when inserting the key.
insert into Table1 (KeyValue)
select 'NewKey'
where not exists (select *
from Table1
where KeyValue = 'NewKey')
First option - don't do it.
Don't do it; Unless the WORKERS are doing extraordinary amounts of work (we're talking about computers, so requiring 1 second per record qualifies as "extraordinary amount of work"), just use a single thread; Even better, do all the work in a stored procedure, you'd be amazed by the speedup gained by not transporting data over whatever protocol into your app.
Second option - Use a Queue
Make sure your worker threads don't all work on the same ID. Set up a Queue, push all the ID's that need processing into that queue, have each working thread Dequeue an ID from that Queue. This way you're guaranteed no two workers work on the same record at the same time. This might be difficult to implement if your workers are not all part of the same process.
Last resort
Set up an DB-based "Reservation" system so an Worker Thread can mark a Key for "work in process" so no two workers would work on the same Key. I'd set up a table like this:
CREATE TABLE KEY_RESERVATIONS (
KEY INTEGER NOT NULL, /* This is the KEY you'd be reserving */
RESERVED_UNTIL TIMESTAMP NOT NULL /* We don't want to keep reservations for ever in case of failure */
);
Each of your workers would use short transactions to work on that table: Select a candidate Key, one that's not in the KEY_RESERVATIONS table. Try to INSERT. Failed? Try an other KEY. Periodically delete all reserved key with old RESERVED_UNTIL timestamps. Make sure the transactions for working with KEY_RESERVATIONS are as short as possible, so that two threads both trying to reserve the same key at the same time would fail quickly.
This is what you have to deal with in an optimistic (or no-) locking scheme.
One way to avoid it is to put a pessimistic lock on the table around the whole select, insert, commit sequence.
However, that means you will have to deal with not being able to access the table (handle table-locked exceptions).
If by workers you mean threads in the same application instance instead of different users (application instances), you will need thread synchronization like kubal5003 says around the select-insert-commit sequence.
A combination of the two is needed if you have multiple users/application instances each with multiple threads.
Synchronize your threads to make it impossible to insert the same value or use a db side key generation method (I don't know Firebird so I don't even know if it's there, eg. in MsSQL Server there is identity column or GUIDs also solve the problem because it's unlikely to generate two identical ones)
You should not rely the client to generate the unique key, if there's possibility for duplicates.
Use triggers and generators (maybe with help of stored procedure) to create always unique keys.
More information about proper autoinc implementation in Firebird here: http://www.firebirdfaq.org/faq29/