I've got an ASP.NET app using NHibernate to transactionally update a few tables upon a user action. There is a date range involved whereby only one entry to a table 'Booking' can be made such that exclusive dates are specified.
My problem is how to prevent a race condition whereby two user actions occur almost simultaneously and cause mutliple entries into 'Booking' for >1 date. I can't check just prior to calling .Commit() because I think that will still leave be with a race condition?
All I can see is to do a check AFTER the commit and roll the change back manually, but that leaves me with a very bad taste in my mouth! :)
booking_ref (INT) PRIMARY_KEY AUTOINCREMENT
booking_start (DATETIME)
booking_end (DATETIME)
make the isolation level of your transaction SERIALIZABLE (session.BeginTransaction(IsolationLevel.Serializable) and check and insert in the same transaction. You should not in general set the isolationlevel to serializable, just in situations like this.
or
lock the table before you check and eventually insert. You can do this by firing a SQL query through nhibernate:
session.CreateSQLQuery("SELECT null as dummy FROM Booking WITH (tablockx, holdlock)").AddScalar("dummy", NHibernateUtil.Int32);
This will lock only that table for selects / inserts for the duration of that transaction.
Hope it helped
The above solutions can be used as an option. If I sent 100 request by using "Parallel.For" while transaction level is serializable, yess there is no duplicated reqeust id but 25 transaction is failed. It is not acceptable for my client. So we fixed the problem with only storing request id and adding an unique index on other table as temp.
Your database should manage your data integrity.
You could make your 'date' column unique. Therefore, if 2 threads try to get the same date. One will throw a unique key violation and the other will succeed.
Related
My question is probably very specific to Postgres, probably not.
A program which I cannot modify has access to Postgress via npgsql and a simple select command, all I know.
I also have access via npgsql. The table is defined as:
-- Table: public.n_data
-- DROP TABLE public.n_data;
CREATE TABLE public.n_data
(
u_id integer,
p_id integer NOT NULL,
data text,
CONSTRAINT nc PRIMARY KEY (p_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.n_data
OWNER TO postgres;
(If that info is useful anyway)
I access one single big column, read from it and write back to it.
This all works fine so far.
The Question is: how does Postgres handles it if we write at the same time.
Any Problems there?
And if Postgres does not handle that automatically, how about when I read the data, process it and in the meantime data changes, and I write back that data after I processed it---> lost data.
Its a bit tricky to test for data integrity, since this datablock is huge, and corruptions are hard to find.
I do it with c# if that means anything.
Locking (in most1) relational databases (including Postgres) is always on row level, never on column level (it's columns and rows in a relational database not "cells", "fields" or "records")
If two transactions modify the same row, the second one will have to wait until the first one commits or rolls back.
If two transactions modify different rows then they can do that without any problems as long as they don't modify columns that are part of a unique constraint or primary key to the same value.
Read access to data is never blocked in Postgres by regular DML statements. So yes while one transaction modifies data, another one will see the old data until the first transaction commits the changes ("read consistency").
To handle lost updates you can either use the serializable isolation level or make all transactions follow the pattern that they first need to obtain a lock on the row (select ... for update) and hold that until they are finished. Search for "pessimistic locking" to get more details about this pattern.
Another option is to include a "modified" timestamp in your table. When a process reads the data it also reads the modification timestamp. When it sends back the new changes it includes a where modified_at = <value obtained when reading> - if the data has changed the condition will not hold true and nothing will be updated and you need to restart your transaction. Search for "optimistic locking" to find more details about this pattern.
1 some DBMS do page locking and some escalate many row level locks to a table lock. Neither is the case in Postgres
How do I start a trigger so that this allows nobody to be able to rent a movie if their unpaid balance exceeds 50 dollars?
What you have here is a cross-row table constraint - i.e. you can't just put a single Oracle CONSTRAINT on a column, as these can only look at data within a single row at a time.
Oracle has support for only two cross-row constraint types - uniqueness (e.g. primary keys and unique constraints) and referential integrity (foreign keys).
In your case, you'll have to hand-code the constraint yourself - and with that comes the responsibility to ensure that the constraint is not violated in the presence of multiple sessions, each of which cannot see data inserted/updated by other concurrent sessions (at least, until they commit).
A simplistic approach is to add a trigger that issues a query to count how many records conflict with the new record; but this won't work because the trigger cannot see rows that have been inserted/updated by other sessions but not committed yet; so the trigger will sometimes allow members to rent 6 videos, as long as (for example) they get two cashiers to enter the data in separate terminals.
One way to get around this problem is to put some element of serialization in - e.g. the trigger would first request a lock on the member record (e.g. with a SELECT FOR UPDATE) before it's allowed to check the rentals; that way, if a 2nd session tries to insert rentals, it will wait until the first session does a commit or rollback.
Another way around this problem is to use an aggregating Materialized View, which would be based on a query that is designed to find any rows that fail the test; the expectation is that the MV will be empty, and you put a table constraint on the MV such that if a row was ever to appear in the MV, the constraint would be violated. The effect of this is that any statement that tries to insert rows that violate the constraint will cause a constraint violation when the MV is refreshed.
Writing the query for this based on your design is left as an exercise for the reader :)
If you want to restrict something about your table data then you should have a look at Constraints and not Triggers.
Constraints are ensuring some conditions about your table data. Like your example.
Triggers are fired when some action (i.e. INSERT, UPDATE, DELETE) took place and you can do some work then as a reaction to this action.
In an app that I am developing, we have users who will make deposits into the app and that becomes their balance.
They can use the balance to perform certain actions and also withdraw the balance. What is the schema design that ensures that the user can never withdraw / or perform more than he has, even in concurrency.
For example:
CREATE TABLE user_transaction (
transaction_id SERIAL NOT NULL PRIMARY KEY,
change_value BIGINT NOT NULL,
user_id INT NOT NULL REFERENCES user
)
The above schema can keep track of balance, (select sum() from user_transction); However, this does not hold in concurrency. Because the user can post 2 request simultaneously, and two records could be inserted in 2 simultaneous database connections.
I can't do in-app locking either (to ensure only one transcation gets written at a time), because I run multiple web servers.
Is there a database schema design that can ensure correctness?
P.S. Off the top of my head, I can imagine leveraging the uniqueness constraint in SQL. By having later transaction reference earlier transactions, and since each earlier transaction can only be referenced once, that ensures correctness at the database level.
Relying on calculating an account balance every time you go to insert a new transaction is not a very good design - for one thing, as time goes by it will take longer and longer, as more and more rows appear in the transaction table.
A better idea is to store the current balance in another table - either a new table, or in the existing users table that you are already using as a foreign key reference.
It could look like this:
CREATE TABLE users (
user_id INT PRIMARY KEY,
balance BIGINT NOT NULL DEFAULT 0 CHECK(balance>=0)
);
Then, whenever you add a transaction, you update the balance like this:
UPDATE user SET balance=balance+$1 WHERE user_id=$2;
You must do this inside a transaction, in which you also insert the transaction record.
Concurrency issues are taken care of automatically: if you attempt to update the same record twice from two different transactions, then the second one will be blocked until the first one commits or rolls back. The default transaction isolation level of 'Read Committed' ensures this - see the manual section on concurrency.
You can issue the whole sequence from your application, or if you prefer you can add a trigger to the user_transaction table such that whenever a record is inserted into the user_transaction table, the balance is updated automatically.
That way, the CHECK clause ensures that no transactions can be entered into the database that would cause the balance to go below 0.
I have a table named as 'games', which contains a column named as 'title', this column is unique, database used in PostgreSQL
I have a user input form that allows him to insert a new 'game' in 'games' table. The function that insert a new game checks if a previously entered 'game' with the same 'title' already exists, for this, I get the count of rows, with the same game 'title'.
I use transactions for this, the insert function at the start uses BEGIN, gets the row count, if row count is 0, inserts the new row and after process is completed, it COMMITS the changes.
The problem is that, there are chances that 2 games with the same title if submitted by the user at the same time, would be inserted twice, since I just get the count of rows to chk for duplicate records, and each of the transaction would be isolated from each other
I thought of locking the tables when getting the row count as:
LOCK TABLE games IN ACCESS EXCLUSIVE MODE;
SELECT count(id) FROM games WHERE games.title = 'new_game_title'
Which would lock the table for reading too (which means the other transaction would have to wait, until the current one is completed successfully). This would solve the problem, which is what I suspect. Is there a better way around this (avoiding duplicate games with the same title)
You should NOT need to lock your tables in this situation.
Instead, you can use one of the following approaches:
Define UNIQUE index for column that really must be unique. In this case, first transaction will succeed, and second will error out.
Define AFTER INSERT OR UPDATE OR DELETE trigger that will check your condition, and if it does not hold, it should RAISE error, which will abort offending transaction
In all these cases, your client code should be ready to properly handle possible failures (like failed transactions) that could be returned by executing your statements.
Using the highest transaction isolation(Serializable) you can achieve something similar to your actual question. But be aware that this may fail ERROR: could not serialize access due to concurrent update
I do not agree with the constraint approach entirely. You should have a constraint to protect data integrity, but relying on the constraint forces you to identify not only what error occurred, but which constraint caused the error. The trouble is not catching the error as some have discussed but identifying what caused the error and providing a human readable reason for the failure. Depending on which language your application is written in, this can be next to impossible. eg: telling the user "Game title [foo] already exists" instead of "game must have a price" for a separate constraint.
There is a single statement alternative to your two stage approach:
INSERT INTO games ( [column1], ... )
SELECT [value1], ...
WHERE NOT EXISTS ( SELECT x FROM games as g2 WHERE games.title = g2.title );
I want to be clear with this... this is not an alternative to having a unique constraint (which requires extra data for the index). You must have one to protect your data from corruption.
I want to insert a row, but if a conflict occurs (example below) I'd like the database to lock the existing row so I can log its contents for debugging purposes. I am using READ_COMMITTED transaction isolation.
For example:
CREATE TABLE users(id BIGINT AUTO_INCREMENT, name VARCHAR(30),
count INT NOT NULL, PRIMARY KEY(id), UNIQUE(name));
Thread 1:
INSERT INTO users(username, count) VALUES('joe', 1000);
transaction.commit();
Thread 2:
// Insert fails due to conflict with above record
INSERT INTO users(username, count) VALUES('joe', 0);
// Get the conflicting row and log its properties
SELECT * FROM users WHERE username = 'joe';
If the conflicting row is not locked, it may be modified by the time I check it. The only workaround I found is invoking SELECT id FROM users WHERE username = 'joe' FOR UPDATE before the insert. Is it possible way to implement this without any overhead when a conflict does not occur?
UPDATE: I am not asking to avoid the conflict or the resulting SQLException. I am just asking for the conflicting row to get locked so I can look up what values triggered the conflict. Yes, I know that the conflicting record contains joe but I want to log all its other columns.
No it is not possible to eliminate the confict of a UNIQUE column
when using INSERT of rows with unique column(s).
Trying to write SQL that never has to deal with SQL Exceptions
is just wasted effort that always ends up creating SQL that fails
under some conditions.
Exception handling can't be avoided when dealing with real time
multi-threaded multi-user database servers, unless you
can afford to lock the table, do the update, and unlock the
table (which will create terrible performance when under
heavy load of many users)
The UNIQUE CONSTRAINT VIOLATION Exception will ALWAYS occur on the 2nd INSERT,
as the two INSERTs in your example could be widely separated in time
(e.g. by hours, days or weeks); Table or row locking won't change this.
This problem is one that should be solved at the GUI level anyway
as choosing a "user name" that may already be chosen by a previous
user, requires providing the "new" user with feedback like
"Sorry, that user name is already in use by another user", so
it would seen unlikely that handling the UNIQUE VIOLATION exception
can or should ever be "avoided".
In addition, there is no reason to SELECT ... FOR UPDATE, since
all you need to do is SELECT id WHERE name = newName and see if
you get a resulting id or null; (id == null) => user name not in use,
but even then two user could try to both get the "not in use" result
at the same time and one of the INSERTs could still fail.
When the UNIQUE exception is returned on the duplicate INSERT,
the second INSERT has failed and that record was not created,
so there is no "duplicate" record to lock and then read after
the UNIQUE exception is returned on the failed INSERT.
Wich version of SQL are you usign? I'm not sure if I understand correctly your question but I think you could do this in a trigger.
In the trigger, you can view the inserted value (your conflicting row) and log it, and make a rollback. Wich means that when you insert your row, when a conflict does not occur, you don't have to commit anything, and when a conflict occurs, the log is made and the row is not inserted.
No, most databases do not support that kind of operation.
You can do tricks like creating an explicit transaction
BEGIN TRANSACTION
IF EXIST(SELECT ...)
ROLLBACK
INSERT INTO...
COMMIT
But that isn't exactly what you want. The only to achieve what you're asking for is to use one of the B-TREE style libraries which are a lot more low-level.
There doesn't seem to be a portable way of doing this and looking at MVCC there is a strong indication that this cannot be implemented without a substantial performance impact.
So in conclusion: you're going to have to settle for knowing that a conflict occurred but have no way of being 100% sure of the cause (there is no thread-safe to verify).