Reserve Autoincrement ID Within Transaction - sql

Say I have two tables, users and groups each of which has an auto-incrementing primary key. Whenever a new user is created, I would like to create a group along with that user (at the same time/same transaction). However, the users need to know which group they belong to thus each user stores a group_id.
In order to do this, I create and insert a group into the database, find out what the primary key of that group I just inserted was, and finally insert the user with that new group's primary key.
Note that I need to commit the group to the database (thus having it outside any transaction with committing the user as well) in order to retrieve the primary key it was assigned.
Although this will work in most situations, if there is some kind of failure (power failure, system crash, etc.) between when I insert the group and find out its primary key, and when I insert the user, I will end up with an inconsistent database.
Is there a way to do something like reserving a primary key temporarily so that if the system crashes, it won't end up in an inconsistent state?
I'm primarily concerned with MySQL databases but if there is something in standard SQL which will allow me to do this (and is thus, compatible with other database backends), I'm also interested in knowing that.

Easy, just put both operations in a transaction. Start the transaction, create the group, create the user, then commit the transaction.
SET autocommit = 0
START TRANSACTION
INSERT INTO Groups ...
INSERT INTO Users ...
COMMIT
You would have to be using an engine that supports transactions, such as InnoDB, for your tables in order for that to work though. The default MyISAM engine does not support transactions.

If you use transactions then you'll have no problem.

Related

How to know if an uncommitted transaction tries to insert a specific unique key into SQL

I'm writing a programm which inserts data to a MariaDB-Server and can be used by different people on the same time. The transactions take some time, so the following problem might occur: Person A starts a transaction with primary key "c" and while the transaction is still uncommitted, Person B wants to insert data with the same primary key "c". How can I prevent that B can start its transaction with a primary key that A already uses in its uncommitted transaction?
I use MariaDB as database and InnoDB as Engine.
I've checked the Isolation-Levels but couldn't figure how to use them to solve my Problem.
Thanks!
It has nothing to do with transaction isolation levels. It's about locking.
Any insert/update/delete to a specific entry in an index locks that entry. Locks are granted first-come, first-serve. The next session that tries to do an insert/update/delete to the same index entry will be blocked.
You can demo this yourself. Open two MySQL client windows side by side.
First window:
mysql> START TRANSACTION;
mysql> INSERT INTO mytable SET c = 42;
Then don't commit yet.
Second window:
mysql> INSERT INTO mytable SET c = 42;
Notice that it hangs at this point, waiting for the lock.
First window:
mysql> commit;
Second window finally returns:
ERROR 1062 (23000): Duplicate entry '42' for key 'PRIMARY'
Every table should have a PRIMARY KEY. In MySQL, the PRIMARY KEY is, by definition, UNIQUE.
You can also have UNIQUE keys declared on the table.
Each connection should be doing this to demark a transaction:
BEGIN;
various SQL statements
COMMIT;
If any of those SQL statements inserts a row, it uses the unique key(s) to block others from inserting the same unique value into that table. This will lead to some form of error -- deadlock (fatal to the transaction), "lock wait timeout" -- which it might recover from, etc.
Note: If you have any SELECTs in the transaction, you may need to stick FOR UPDATE on the end of them. This signals what rows you might change in the transaction, thereby giving other connections a heads-up to stay out of the way.
Can you find out if any of this is going on? Not really. But why bother? Simply plow ahead and do what you need to do. But check for errors to see if some other connection prevented you from doing it.
Think of it is "optimistic" coding.
Leave the isolation level alone; it only adds confusion to typical tasks.
Primary keys are internal values that ensure uniqueness of rows and are not meant to be exposed to the external world.
Generate your primary keys using IDENTITY columns or using SEQUENCEs. They will handle multiple simultaneous inserts gracefully and will assign each one different values.
Using IDENTITY:
CREATE TABLE house (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
address VARCHAR(40) NOT NULL
);
INSERT INTO house (address) VALUES ('123 Maple Street');
Using a SEQUENCE:
CREATE SEQUENCE myseq1;
CREATE TABLE house (
id INTEGER NOT NULL PRIMARY KEY,
address VARCHAR(40) NOT NULL
);
INSERT INTO house (id, address) VALUES (NEXT VALUE FOR myseq1, '123 Maple Street');

Postgres access a single column by two different programs

My question is probably very specific to Postgres, probably not.
A program which I cannot modify has access to Postgress via npgsql and a simple select command, all I know.
I also have access via npgsql. The table is defined as:
-- Table: public.n_data
-- DROP TABLE public.n_data;
CREATE TABLE public.n_data
(
u_id integer,
p_id integer NOT NULL,
data text,
CONSTRAINT nc PRIMARY KEY (p_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.n_data
OWNER TO postgres;
(If that info is useful anyway)
I access one single big column, read from it and write back to it.
This all works fine so far.
The Question is: how does Postgres handles it if we write at the same time.
Any Problems there?
And if Postgres does not handle that automatically, how about when I read the data, process it and in the meantime data changes, and I write back that data after I processed it---> lost data.
Its a bit tricky to test for data integrity, since this datablock is huge, and corruptions are hard to find.
I do it with c# if that means anything.
Locking (in most1) relational databases (including Postgres) is always on row level, never on column level (it's columns and rows in a relational database not "cells", "fields" or "records")
If two transactions modify the same row, the second one will have to wait until the first one commits or rolls back.
If two transactions modify different rows then they can do that without any problems as long as they don't modify columns that are part of a unique constraint or primary key to the same value.
Read access to data is never blocked in Postgres by regular DML statements. So yes while one transaction modifies data, another one will see the old data until the first transaction commits the changes ("read consistency").
To handle lost updates you can either use the serializable isolation level or make all transactions follow the pattern that they first need to obtain a lock on the row (select ... for update) and hold that until they are finished. Search for "pessimistic locking" to get more details about this pattern.
Another option is to include a "modified" timestamp in your table. When a process reads the data it also reads the modification timestamp. When it sends back the new changes it includes a where modified_at = <value obtained when reading> - if the data has changed the condition will not hold true and nothing will be updated and you need to restart your transaction. Search for "optimistic locking" to find more details about this pattern.
1 some DBMS do page locking and some escalate many row level locks to a table lock. Neither is the case in Postgres

How to do an INSERT into tables with circular relationships (SQL SERVER)

I'm dealing with a set of tables in a database that appear to have a circular relationship (see image). This is the ARTS database if that is of any help to anyone.
A user signing on:
a) must create a (insert into) session, which in turn needs a SessionStartTransactionID (=SignOnTransaction)
b) a SignOnTransaction is a type of ControlTransaction
c) a ControlTransaciton is a type of Transaction
d) a Transaction needs a reference to an existing Session (along with Operator, etc.)
Note:
The Transaction.SessionStartTransactionID,Transaction.OperatorID, and Transaction.WorkStationID (thoese 3 are the composite primary key in Session) cannot be NULL in the Transaction table.
I can't figure out how to create (insert into) SignOnTransaction, or insert into any of the tables mentioned above.
How do you do that in SQL Server? Is that even possible?
Where would I start?
Thanks!
If something you're describing is impossible, then you're understanding it wrong. You can't have table A that has a required Key that references table B that has a required key that references table A. One of the two keys has to be nullable, or foregin key relationships aren't being enforced.
Some ideas
Given that Session uses StartTransactionID as part of its primary key means that it can't be null, so it seems likely that StartTransactionID in Transaction can be null, so that you insert Transaction, then ControlTransaction then SignOnTransaction then Session, then update the Transaction that was created with the id. (If the FK was not enforced, you can skip the update, and just use the same value for the PK if it isn't an Indentity column).
The only other possible solution I can think of is that you have to use an ALTER TABLE Transaction NOCHECK CONSTRAINT StartTransactionIDconstraint_name every time you first insert into Transaction, and then restore the constraint after you update the table. Seems like a hackish solution to be sure. Especially because you can't do an ALTER TABLE in a transaction so that you leave yourself open for a ton of problems.
...
Since this appears to be part of a production system, why don't you run a SQL Trace to see how the data is getting populated. This helps me all the time.

Database Schema Design: Tracking User Balance with concurrency

In an app that I am developing, we have users who will make deposits into the app and that becomes their balance.
They can use the balance to perform certain actions and also withdraw the balance. What is the schema design that ensures that the user can never withdraw / or perform more than he has, even in concurrency.
For example:
CREATE TABLE user_transaction (
transaction_id SERIAL NOT NULL PRIMARY KEY,
change_value BIGINT NOT NULL,
user_id INT NOT NULL REFERENCES user
)
The above schema can keep track of balance, (select sum() from user_transction); However, this does not hold in concurrency. Because the user can post 2 request simultaneously, and two records could be inserted in 2 simultaneous database connections.
I can't do in-app locking either (to ensure only one transcation gets written at a time), because I run multiple web servers.
Is there a database schema design that can ensure correctness?
P.S. Off the top of my head, I can imagine leveraging the uniqueness constraint in SQL. By having later transaction reference earlier transactions, and since each earlier transaction can only be referenced once, that ensures correctness at the database level.
Relying on calculating an account balance every time you go to insert a new transaction is not a very good design - for one thing, as time goes by it will take longer and longer, as more and more rows appear in the transaction table.
A better idea is to store the current balance in another table - either a new table, or in the existing users table that you are already using as a foreign key reference.
It could look like this:
CREATE TABLE users (
user_id INT PRIMARY KEY,
balance BIGINT NOT NULL DEFAULT 0 CHECK(balance>=0)
);
Then, whenever you add a transaction, you update the balance like this:
UPDATE user SET balance=balance+$1 WHERE user_id=$2;
You must do this inside a transaction, in which you also insert the transaction record.
Concurrency issues are taken care of automatically: if you attempt to update the same record twice from two different transactions, then the second one will be blocked until the first one commits or rolls back. The default transaction isolation level of 'Read Committed' ensures this - see the manual section on concurrency.
You can issue the whole sequence from your application, or if you prefer you can add a trigger to the user_transaction table such that whenever a record is inserted into the user_transaction table, the balance is updated automatically.
That way, the CHECK clause ensures that no transactions can be entered into the database that would cause the balance to go below 0.

How can I get the Primary Key id of a file I just INSERTED?

Earlier today I asked this question which arose from A- My poor planning and B- My complete disregard for the practice of normalizing databases. I spent the last 8 hours reading about normalizing databases and the finer points of JOIN and worked my way through the SQLZoo.com tutorials.
I am enlightened. I understand the purpose of database normalization and how it can suit me. Except that I'm not entirely sure how to execute that vision from a procedural standpoint.
Here's my old vision: 1 table called "files" that held, let's say, a file id and a file url and appropos grade levels for that file.
New vision!: 1 table for "files", 1 table for "grades", and a junction table to mediate.
But that's not my problem. This is a really basic Q that I'm sure has an obvious answer- When I create a record in "files", it gets assigned the incremented primary key automatically (file_id). However, from now on I'm going to need to write that file_id to the other tables as well. Because I don't assign that id manually, how do I know what it is?
If I upload text.doc and it gets file_id 123, how do I know it got 123 in order to write it to "grades" and the junction table? I can't do a max(file_id) because if you have concurrent users, you might nab a different id. I just don't know how to get the file_id value without having manually assigned it.
You may want to use LAST_INSERT_ID() as in the following example:
START TRANSACTION;
INSERT INTO files (file_id, url) VALUES (NULL, 'text.doc');
INSERT INTO grades (file_id, grade) VALUES (LAST_INSERT_ID(), 'some-grade');
COMMIT;
The transaction ensures that the operation remains atomic: This guarantees that either both inserts complete successfully or none at all. This is optional, but it is recommended in order to maintain the integrity of the data.
For LAST_INSERT_ID(), the most
recently generated ID is maintained in
the server on a per-connection basis.
It is not changed by another client.
It is not even changed if you update
another AUTO_INCREMENT column with a
nonmagic value (that is, a value that
is not NULL and not 0).
Using
LAST_INSERT_ID() and AUTO_INCREMENT
columns simultaneously from multiple
clients is perfectly valid. Each
client will receive the last inserted
ID for the last statement that client
executed.
Source and further reading:
MySQL Reference: How to Get the Unique ID for the Last Inserted Row
MySQL Reference: START TRANSACTION, COMMIT, and ROLLBACK Syntax
In PHP to get the automatically generated ID of a MySQL record, use mysqli->insert_id property of your mysqli object.
How are you going to find the entry tomorrow, after your program has forgotten the value of last_insert_id()?
Using a surrogate key is fine, but your table still represents an entity, and you should be able to answer the question: what measurable properties define this particular entity? The set of these properties are the natural key of your table, and even if you use surrogate keys, such a natural key should always exist and you should use it to retrieve information from the table. Use the surrogate key to enforce referential integrity, for indexing purpuses and to make joins easier on the eye. But don't let them escape from the database