Excluding a table from a transaction rollback - sql

We have a table and a set of procedures that are used for generating pk ids. The table holds the last id, and the procedures gets the id, increments it, updates the table, and then returns the newly incremented id.
This procedure could potentially be within a transaction. The problem is that if the we have a rollback, it could potentially rollback to an id that is before any id's that came into use during the transaction (say generated from a different user or thread). Then when the id is incremented again, it will cause duplicates.
Is there any way to exclude the id generating table from a parent transaction to prevent this from happening?
To add detail our current problem...
First, we have a system we are preparing to migrate a lot of data into. The system consists of a ms-sql (2008) database, and a textml database. The sql database houses data less than 3 days old, while the textml acts as an archive for anything older. The textml db also relies on the sql db to provide ids' for particular fields. These fields are Identity PK's currently, and are generated on insertion before publishing to the texml db. We do not want to wash all our migrated data through sql since the records will flood the current system, both in terms of traffic and data. But at the same time we have no way of generating these id's since they are auto-incremented values that sql server controls.
Secondly, we have a system requirement which needs us to be able to pull an old asset out of the texml database and insert it back into the sql database with the original id's. This is done for correction and editing purposes, and if we alter the id's it will break relations downstream on clients system which we have no control over. Of course all this is an issue because id columns are Identity columns.

procedures gets the id, increments it,
updates the table, and then returns
the newly incremented id
This will cause deadlocks. procedure must increment and return in one single, atomic, step, eg. by using the OUTPUT clause in SQL Server:
update ids
set id = id + 1
output inserted.id
where name= #name;
You don't have to worry about concurrency. The fact that you generate ids this way implies that only one transaction can increment an id, because the update will lock the row exclusively. You cannot get duplicates. You do get complete serialization of all operations (ie. no performance and low throughput) but that is a different issue. And this why you should use built-in mechanisms for generating sequences and identities. These are specific to each platform: AUTO_INCREMENT in MySQL, SEQUENCE in Oracle, IDENTITY and SEQUENCE in SQL Server (sequence only in Denali) etc etc.
Updated
As I read your edit, the only reason why you want control of the generated identities is to be able to insert back archived records. This is already possible, simply use IDENTITY_INSERT:
Allows explicit values to be inserted
into the identity column of a table
Turn it on when you insert back the old record, then turn it back off:
SET IDENTITY_INSERT recordstable ON;
INSERT INTO recordstable (id, ...) values (#oldid, ...);
SET IDENTITY_INSERT recordstable OFF;
As for why manually generated ids serialize all operations: any transaction that generates an id will exclusively lock the row in the ids table. No other transaction can read or write that row until the first transaction commits or rolls back. Therefore there can be only one transaction generating an id on a table at any moment, ie. serialization.

Related

Postgres access a single column by two different programs

My question is probably very specific to Postgres, probably not.
A program which I cannot modify has access to Postgress via npgsql and a simple select command, all I know.
I also have access via npgsql. The table is defined as:
-- Table: public.n_data
-- DROP TABLE public.n_data;
CREATE TABLE public.n_data
(
u_id integer,
p_id integer NOT NULL,
data text,
CONSTRAINT nc PRIMARY KEY (p_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.n_data
OWNER TO postgres;
(If that info is useful anyway)
I access one single big column, read from it and write back to it.
This all works fine so far.
The Question is: how does Postgres handles it if we write at the same time.
Any Problems there?
And if Postgres does not handle that automatically, how about when I read the data, process it and in the meantime data changes, and I write back that data after I processed it---> lost data.
Its a bit tricky to test for data integrity, since this datablock is huge, and corruptions are hard to find.
I do it with c# if that means anything.
Locking (in most1) relational databases (including Postgres) is always on row level, never on column level (it's columns and rows in a relational database not "cells", "fields" or "records")
If two transactions modify the same row, the second one will have to wait until the first one commits or rolls back.
If two transactions modify different rows then they can do that without any problems as long as they don't modify columns that are part of a unique constraint or primary key to the same value.
Read access to data is never blocked in Postgres by regular DML statements. So yes while one transaction modifies data, another one will see the old data until the first transaction commits the changes ("read consistency").
To handle lost updates you can either use the serializable isolation level or make all transactions follow the pattern that they first need to obtain a lock on the row (select ... for update) and hold that until they are finished. Search for "pessimistic locking" to get more details about this pattern.
Another option is to include a "modified" timestamp in your table. When a process reads the data it also reads the modification timestamp. When it sends back the new changes it includes a where modified_at = <value obtained when reading> - if the data has changed the condition will not hold true and nothing will be updated and you need to restart your transaction. Search for "optimistic locking" to find more details about this pattern.
1 some DBMS do page locking and some escalate many row level locks to a table lock. Neither is the case in Postgres

Insert & Delete from SQL best practice

I have a database with 2 tables: CurrentTickets & ClosedTickets. When a user creates a ticket via web application, a new row is created. When the user closes a ticket, the row from currenttickets is inserted into ClosedTickets and then deleted from CurrentTickets. If a user reopens a ticket, the same thing happens, only in reverse.
The catch is that one of the columns being copied back to CurrentTickets is the PK column (TicketID)that idendity is set to ON.
I know I can set the IDENTITY_INSERT to ON but as I understand it, this is generally frowned upon. I'm assuming that my database is a bit poorly designed. Is there a way for me to accomplish what I need without using IDENTITY_INSERT? How would I keep the TicketID column autoincremented without making it an identity column? I figure I could add another column RowID and make that the PK but I still want the TicketID column to autoincrement if possible but still not be considered an Idendity column.
This just seems like bad design with 2 tables. Why not just have a single tickets table that stores all tickets. Then add a column called IsClosed, which is false by default. Once a ticket is closed you simply update the value to true and you don't have to do any copying to and from other tables.
All of your code around this part of your application will be much simpler and easier to maintain with a single table for tickets.
Simple answer is DO NOT make an Identity column if you want your influence on the next Id generated in that column.
Also I think you have a really poor schema, Rather than having two tables just add another column in your CurrentTickets table, something like Open BIT and set its value to 1 by default and change the value to 0 when client closes the Ticket.
And you can Turn it On/Off as many time as client changes his mind, with having to go through all the trouble of Insert Identity and managing a whole separate table.
Update
Since now you have mentioned its SQL Server 2014, you have access to something called Sequence Object.
You define the object once and then every time you want a sequential number from it you just select next value from it, it is kind of hybrid of an Identity Column and having a simple INT column.
To achieve this in latest versions of SQL Server use OUTPUT clause (definition on MSDN).
OUTPUT clause used with a table variable:
declare #MyTableVar (...)
DELETE FROM dbo.CurrentTickets
OUTPUT DELETED.* INTO #MyTableVar
WHERE <...>;
INSERT INTO ClosedTicket
Select * from #MyTableVar
Second table should have ID column, but without IDENTITY property. It is enforced by the other table.

SQL Server breaks the assigned identity seed increment?

I have two tables: table1 (column id: Seed 1000, increment 1) & table2 (column id: Seed 2000, increment 1). First I insert some records in table1 & table2. Second I insert to table2 in table1 (using identity_insert_on) and get something like:
1000 first_record_table1
1001 second_record_table1
1002 third_record_table1
2000 first_record_table2
2001 second_record_table2
Third: If I add a new record in table1, i expect to get 1003 as the new id but i get 2002.
(1003 is not breaking the unique id rule and is the correct increment value for table1, so I don't see any reason to jump to the last record and increment 1, as the database seems to do).
Question: How can I get 1003 as the new id?
The documentation on identity is quite clear:
The identity property on a column does not guarantee the following:
Uniqueness of the value – Uniqueness must be enforced by using a PRIMARY KEY or UNIQUE constraint or UNIQUE index.
Consecutive values within a transaction – A transaction inserting multiple rows is not guaranteed to get consecutive values for the rows because other concurrent inserts might occur on the table. If values must be consecutive then the transaction should use an exclusive lock on the table or use the SERIALIZABLE isolation level.
Consecutive values after server restart or other failures –SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert. If gaps are not acceptable then the application should use a sequence generator with the NOCACHE option or use their own mechanism to generate key values.
Reuse of values – For a given identity property with specific seed/increment, the identity values are not reused by the engine. If a particular insert statement fails or if the insert statement is rolled back then the consumed identity values are lost and will not be generated again. This can result in gaps when the subsequent identity values are generated.
In most cases, an identity primary key column does just what it is intended to do. It creates a new value bigger than any previous value when a new row is inserted. There may be gaps, but that is not a problem for a primary key.
If you want a column that fills in gaps, then you will have to write a trigger to assign the values.
Alternatively, you could just use row_number() in queries to get sequential values.
Increment 1 just means the next record will have an ID that's one larger than the largest one, not that all numbers will be used. If you want the 1003 as the new ID, you'll have to do some programming yourself, and handle the case that the new is already taken once you reach 2000.
But you should never ever rely on any generated IDs being without gaps. Imagine you have 2 sessions. First session inserts something and does not commit nor rollback yet. Second session inserts something and commits. First session rolls back. How do you want to handle this case? You will need to assign some ID to the first session, and a different ID to the second one. Now the first session rolls back, so its ID is unused. Do you want to subtract one from the second session's ID? Very bad idea. Block the 2nd session until the 1st one has either rolled back or committed? Very bad idea as well.
So please forget about consecutive IDs, they will never work. Unique IDs are what you normally need, and SQL Server guarantees them.

How can I get the Primary Key id of a file I just INSERTED?

Earlier today I asked this question which arose from A- My poor planning and B- My complete disregard for the practice of normalizing databases. I spent the last 8 hours reading about normalizing databases and the finer points of JOIN and worked my way through the SQLZoo.com tutorials.
I am enlightened. I understand the purpose of database normalization and how it can suit me. Except that I'm not entirely sure how to execute that vision from a procedural standpoint.
Here's my old vision: 1 table called "files" that held, let's say, a file id and a file url and appropos grade levels for that file.
New vision!: 1 table for "files", 1 table for "grades", and a junction table to mediate.
But that's not my problem. This is a really basic Q that I'm sure has an obvious answer- When I create a record in "files", it gets assigned the incremented primary key automatically (file_id). However, from now on I'm going to need to write that file_id to the other tables as well. Because I don't assign that id manually, how do I know what it is?
If I upload text.doc and it gets file_id 123, how do I know it got 123 in order to write it to "grades" and the junction table? I can't do a max(file_id) because if you have concurrent users, you might nab a different id. I just don't know how to get the file_id value without having manually assigned it.
You may want to use LAST_INSERT_ID() as in the following example:
START TRANSACTION;
INSERT INTO files (file_id, url) VALUES (NULL, 'text.doc');
INSERT INTO grades (file_id, grade) VALUES (LAST_INSERT_ID(), 'some-grade');
COMMIT;
The transaction ensures that the operation remains atomic: This guarantees that either both inserts complete successfully or none at all. This is optional, but it is recommended in order to maintain the integrity of the data.
For LAST_INSERT_ID(), the most
recently generated ID is maintained in
the server on a per-connection basis.
It is not changed by another client.
It is not even changed if you update
another AUTO_INCREMENT column with a
nonmagic value (that is, a value that
is not NULL and not 0).
Using
LAST_INSERT_ID() and AUTO_INCREMENT
columns simultaneously from multiple
clients is perfectly valid. Each
client will receive the last inserted
ID for the last statement that client
executed.
Source and further reading:
MySQL Reference: How to Get the Unique ID for the Last Inserted Row
MySQL Reference: START TRANSACTION, COMMIT, and ROLLBACK Syntax
In PHP to get the automatically generated ID of a MySQL record, use mysqli->insert_id property of your mysqli object.
How are you going to find the entry tomorrow, after your program has forgotten the value of last_insert_id()?
Using a surrogate key is fine, but your table still represents an entity, and you should be able to answer the question: what measurable properties define this particular entity? The set of these properties are the natural key of your table, and even if you use surrogate keys, such a natural key should always exist and you should use it to retrieve information from the table. Use the surrogate key to enforce referential integrity, for indexing purpuses and to make joins easier on the eye. But don't let them escape from the database

identity to be incremented only if record is inserted

sql server 2005 : i have a column empid in employee table with identity on.if there is some error while inserting data into table .identity is incremented .i want identity to be incremented only if record is inserted .like if i have generated emp id from 1 to 5 and then on 6th record insertion error ocurrs.and on next record insertion identity value will be 7 .i want it to be 6.
Why do you want to do that ?
The identity column should only be used as an 'internal administrative value' for the database, and it should have no 'business value', so why does it matter that there are gaps in that sequence ?
If identity is used correctly, then users of your software will never be faced with the column that has an identity value; you just use it to uniquely identify a record.
I don't think this can be done. If you want your identity numbers to be exactly sequential then you may have to generate them yourself, rather than using the SQL Identity feature.
edit: Even rolling back the failed transactions will not make the Identity count go back down, this is by design, see this other question.
What valid business reason do you have for caring if there are gaps? There is no reason for the database to care and every reason to want to make sure that identity values are never reused for something else as they can cause major problems with data integrity with looking up information based on old reports, etc. Suppose you have a report that shows the orders last month and then you delete one of the records becasue the customer was duplicated and thus dedupped. Then you reuse the identity field for the dupped customer that was removed. NOw someone looknig at last month's report goes to look up customer 12345 and the data associated with that cuisotmer belongs to John Smith rather than Sally Jones. BUt the person doesn;t know that because she is using an aggreagate, so now she has incorrect information that was totally avoidable. If she was looking up the delted customer, the process instead could have redirected her to the correct customer left after the dedupping.
When you need to have this specific behaviour you should use stored procedures to generate the ID. This way you can really to a rollback. But keep in mind that the current behaviour is by purpose.
Transaction isolation and different read levels (dirty reads) will most likely get you into trouble when you don't use locking on that id field in your masterdata table that holds the current or next ID value.