Alternatives to Identity Column for Table With Frequent Inserts & Deletes? - sql

Lets say I have a session table like this:
[Session]
-------
Id: int
UserId: int
Imagine that is used in an extremely high traffic site and Sessions are very frequently added and deleted. If I were to make the Id column of each table an Identity column, how could I easily maintain the seeding of the Id's so that they don't hit the limits of the int data type? Is there an alternative way of ensuring unique Id's that I'm not thinking of? Thanks in advance.

instead of int make it bigint, this will go up to 9,223,372,036,854,775,807
you can of course start at -9,223,372,036,854,775,808 as well
see also What To Do When Your Identity Column Maxes Out

Make the id a guid instead of int.
You get unique session id's that are not guessable and easy to implement with Guid.NewGuid().

If you have a site maintenance period you could just reseed the identity column. Naff but simple.

How long can a given session exist? If no session will last more than X period of time, and you know that you will never have more than N sessions present in the table at any given time, and you know the maxiumum rate at which new sessions will be added, then you could implement some form of circular queue system, cycling over a maximum set of numbers.
For example, if you never have more than 1000 rows in the table at any given point in time, no more than 1000 rows will be added in any given 5 minute period, and no row will persist for more than 2 days (nightly clean-up routine?), then you would go through 1000 * 2 * 24 * 12 = 576,000 Ids every two days... where every id gets added, used, and removed from the system every two days. Build circular queue logic around a large safety factor of that number (5,000,000, maybe), and you could be covered.
The hard part, of course, is generating the Id. I've done that in the past with a one-row "NextId" table which was defined and called like so:
-- Create table
CREATE TABLE NextId
(NextId int not null)
-- Add the one row to the table
INSERT Nextid (Nextid) values (1)
Optionally, put an INSERT/DELETE trigger on here to prevent the addition or deletion of rows
This procedure would be used to get the NextId to use. The single transaction is of course atomic, so you don't have to worry about locking. I used 10 for testing purposes. You will end up with an Id value of 0 every now and then, but it's a surrogate key so the actual value used should not matter.
CREATE PROCEDURE GetNextId
#NextId int OUTPUT
AS
SET NOCOUNT on
UPDATE NextId
set
#NextId = NextId
,NextId = (NextId + 1) % 10 -- 5000000
RETURN
Here's how the procedure would be called:
DECLARE #NextId int
EXECUTE GetNextId #NextId output
PRINT #NextId
I don't know how well this would work in excessively high-volume situations, but it does work well under fair-size workloads.

Related

GORM Auto-increments primary key even if data wasnt inserted into DB [duplicate]

I'm using MySQL's AUTO_INCREMENT field and InnoDB to support transactions. I noticed when I rollback the transaction, the AUTO_INCREMENT field is not rollbacked? I found out that it was designed this way but are there any workarounds to this?
It can't work that way. Consider:
program one, you open a transaction and insert into a table FOO which has an autoinc primary key (arbitrarily, we say it gets 557 for its key value).
Program two starts, it opens a transaction and inserts into table FOO getting 558.
Program two inserts into table BAR which has a column which is a foreign key to FOO. So now the 558 is located in both FOO and BAR.
Program two now commits.
Program three starts and generates a report from table FOO. The 558 record is printed.
After that, program one rolls back.
How does the database reclaim the 557 value? Does it go into FOO and decrement all the other primary keys greater than 557? How does it fix BAR? How does it erase the 558 printed on the report program three output?
Oracle's sequence numbers are also independent of transactions for the same reason.
If you can solve this problem in constant time, I'm sure you can make a lot of money in the database field.
Now, if you have a requirement that your auto increment field never have gaps (for auditing purposes, say). Then you cannot rollback your transactions. Instead you need to have a status flag on your records. On first insert, the record's status is "Incomplete" then you start the transaction, do your work and update the status to "compete" (or whatever you need). Then when you commit, the record is live. If the transaction rollsback, the incomplete record is still there for auditing. This will cause you many other headaches but is one way to deal with audit trails.
Let me point out something very important:
You should never depend on the numeric features of autogenerated keys.
That is, other than comparing them for equality (=) or unequality (<>), you should not do anything else. No relational operators (<, >), no sorting by indexes, etc. If you need to sort by "date added", have a "date added" column.
Treat them as apples and oranges: Does it make sense to ask if an apple is the same as an orange? Yes. Does it make sense to ask if an apple is larger than an orange? No. (Actually, it does, but you get my point.)
If you stick to this rule, gaps in the continuity of autogenerated indexes will not cause problems.
I had a client needed the ID to rollback on a table of invoices, where the order must be consecutive
My solution in MySQL was to remove the AUTO-INCREMENT and pull the latest Id from the table, add one (+1) and then insert it manually.
If the table is named "TableA" and the Auto-increment column is "Id"
INSERT INTO TableA (Id, Col2, Col3, Col4, ...)
VALUES (
(SELECT Id FROM TableA t ORDER BY t.Id DESC LIMIT 1)+1,
Col2_Val, Col3_Val, Col4_Val, ...)
Why do you care if it is rolled back? AUTO_INCREMENT key fields are not supposed to have any meaning so you really shouldn't care what value is used.
If you have information you're trying to preserve, perhaps another non-key column is needed.
I do not know of any way to do that. According to the MySQL Documentation, this is expected behavior and will happen with all innodb_autoinc_lock_mode lock modes. The specific text is:
In all lock modes (0, 1, and 2), if a
transaction that generated
auto-increment values rolls back,
those auto-increment values are
“lost.” Once a value is generated for
an auto-increment column, it cannot be
rolled back, whether or not the
“INSERT-like” statement is completed,
and whether or not the containing
transaction is rolled back. Such lost
values are not reused. Thus, there may
be gaps in the values stored in an
AUTO_INCREMENT column of a table.
If you set auto_increment to 1 after a rollback or deletion, on the next insert, MySQL will see that 1 is already used and will instead get the MAX() value and add 1 to it.
This will ensure that if the row with the last value is deleted (or the insert is rolled back), it will be reused.
To set the auto_increment to 1, do something like this:
ALTER TABLE tbl auto_increment = 1
This is not as efficient as simply continuing on with the next number because MAX() can be expensive, but if you delete/rollback infrequently and are obsessed with reusing the highest value, then this is a realistic approach.
Be aware that this does not prevent gaps from records deleted in the middle or if another insert should occur prior to you setting auto_increment back to 1.
INSERT INTO prueba(id)
VALUES (
(SELECT IFNULL( MAX( id ) , 0 )+1 FROM prueba target))
If the table doesn't contain values or zero rows
add target for error mysql type update FROM on SELECT
If you need to have the ids assigned in numerical order with no gaps, then you can't use an autoincrement column. You'll need to define a standard integer column and use a stored procedure that calculates the next number in the insert sequence and inserts the record within a transaction. If the insert fails, then the next time the procedure is called it will recalculate the next id.
Having said that, it is a bad idea to rely on ids being in some particular order with no gaps. If you need to preserve ordering, you should probably timestamp the row on insert (and potentially on update).
Concrete answer to this specific dilemma (which I also had) is the following:
1) Create a table that holds different counters for different documents (invoices, receipts, RMA's, etc..); Insert a record for each of your documents and add the initial counter to 0.
2) Before creating a new document, do the following (for invoices, for example):
UPDATE document_counters SET counter = LAST_INSERT_ID(counter + 1) where type = 'invoice'
3) Get the last value that you just updated to, like so:
SELECT LAST_INSERT_ID()
or just use your PHP (or whatever) mysql_insert_id() function to get the same thing
4) Insert your new record along with the primary ID that you just got back from the DB. This will override the current auto increment index, and make sure you have no ID gaps between you records.
This whole thing needs to be wrapped inside a transaction, of course. The beauty of this method is that, when you rollback a transaction, your UPDATE statement from Step 2 will be rolled back, and the counter will not change anymore. Other concurrent transactions will block until the first transaction is either committed or rolled back so they will not have access to either the old counter OR a new one, until all other transactions are finished first.
SOLUTION:
Let's use 'tbl_test' as an example table, and suppose the field 'Id' has AUTO_INCREMENT attribute
CREATE TABLE tbl_test (
Id int NOT NULL AUTO_INCREMENT ,
Name varchar(255) NULL ,
PRIMARY KEY (`Id`)
)
;
Let's suppose that table has houndred or thousand rows already inserted and you don't want to use AUTO_INCREMENT anymore; because when you rollback a transaction the field 'Id' is always adding +1 to AUTO_INCREMENT value.
So to avoid that you might make this:
Let's remove AUTO_INCREMENT value from column 'Id' (this won't delete your inserted rows):
ALTER TABLE tbl_test MODIFY COLUMN Id int(11) NOT NULL FIRST;
Finally, we create a BEFORE INSERT Trigger to generate an 'Id' value automatically. But using this way won't affect your Id value even if you rollback any transaction.
CREATE TRIGGER trg_tbl_test_1
BEFORE INSERT ON tbl_test
FOR EACH ROW
BEGIN
SET NEW.Id= COALESCE((SELECT MAX(Id) FROM tbl_test),0) + 1;
END;
That's it! You're done!
You're welcome.
$masterConn = mysql_connect("localhost", "root", '');
mysql_select_db("sample", $masterConn);
for($i=1; $i<=10; $i++) {
mysql_query("START TRANSACTION",$masterConn);
$qry_insert = "INSERT INTO `customer` (id, `a`, `b`) VALUES (NULL, '$i', 'a')";
mysql_query($qry_insert,$masterConn);
if($i%2==1) mysql_query("COMMIT",$masterConn);
else mysql_query("ROLLBACK",$masterConn);
mysql_query("ALTER TABLE customer auto_increment = 1",$masterConn);
}
echo "Done";

How do I increase 1 value over multiple rows?

I have a item tax table that records the different tax rates for different counties in our state. Each row has an ID number (1-130). Our front end software always orders the tax options by this number when we want it alphabetical. Most of our rows were added that way but I want to be able to insert rows.
Thus I need to add 1 to every entry after a certain number (e.g. 37-130 need to all increase by one). Unfortunately, this is the primary key. Is it possible to increase this value on all of them easily? Or in a loop? I'll have to do this repeatedly as we're moving about a dozen entries if possible.
UPDATE ItemTax
SET ID = ID + 1
WHERE ID = Last ID number
Treating your question as academic, and not endorsing this as an actual solution, you can do this:
UPDATE ItemTax
SET ID = ID + 1
WHERE ID > 37
Depending upon how you use this id, it might be better to leave original ID column unchanged. E.g.
alter table TaxItem add NewID int null
GO
update TaxItem set NewID =
case
when ID between 37 and 130 then ID + 1
else ID
end
Now you don't have to update foreign key relationships, etc.
You see, as ID usually represents a surrogate key, and should never have its value changed in a good design. So your desire to change it value leads to suspicion that you do not understand your design as well as you should. -- We all start from ignorance, I have bad some very poor decisions in the past.
If this is the only change there will ever be for NewID, you don't even need a physical column, a computed column would serve well. But if this is the first mod of many a physical column is likely a better choice.
You also mention inserting rows. Build in some room to insert rows and change values as needed because you have room to rearrange rows by tweaking values without having to renumber entire blocks of rows just to insert a single row, e.g.
update TaxItem set NewID = ID * 100

Add a new column to big database table

I need to add a new column to a table in my database. The table contains around 140 million rows and I'm not sure how to proceed without locking the database.
The database is in production and that's why this has to be as smooth as it can get.
I have read a lot but never really got the answer if this is a risky operation or not.
The new column is nullable and the default can be NULL. As i understood there is a bigger issue if the new column needs a default value.
I'd really appreciate some straight forward answers on this matter. Is this doable or not?
Yes, it is eminently doable.
Adding a column where NULL is acceptable and has no default value does not require a long-running lock to add data to the table.
If you supply a default value, then SQL Server has to go and update each record in order to write that new column value into the row.
How it works in general:
+---------------------+------------------------+-----------------------+
| Column is Nullable? | Default Value Supplied | Result |
+---------------------+------------------------+-----------------------+
| Yes | No | Quick Add (caveat) |
| Yes | Yes | Long running lock |
| No | No | Error |
| No | Yes | Long running lock |
+---------------------+------------------------+-----------------------+
The caveat bit:
I can't remember off the top of my head what happens when you add a column that causes the size of the NULL bitmap to be expanded. I'd like to say that the NULL bitmap represents the nullability of all the the columns currently in the row, but I can't put my hand on my heart and say that's definitely true.
Edit -> #MartinSmith pointed out that the NULL bitmap will only expand when the row is changed, many thanks. However, as he also points out, if the size of the row expands past the 8060 byte limit in SQL Server 2012 then a long running lock may still be required. Many thanks * 2.
Second caveat:
Test it.
Third and final caveat:
No really, test it.
My example is how do I add a new column to the table by tens of millions of rows and fill it by default value without long running lock
USE [MyDB]
GO
ALTER TABLE [dbo].[Customer] ADD [CustomerTypeId] TINYINT NULL
GO
ALTER TABLE [dbo].[Customer] ADD CONSTRAINT [DF_Customer_CustomerTypeId] DEFAULT 1 FOR [CustomerTypeId]
GO
DECLARE #batchSize bigint = 5000
,#rowcount int
,#MaxID int;
SET #rowcount = 1
SET #MaxID = 0
WHILE #rowcount > 0
BEGIN
;WITH upd as (
SELECT TOP (#batchSize)
[ID]
,[CustomerTypeId]
FROM [dbo].[Customer] (NOLOCK)
WHERE [CustomerTypeId] IS NULL
AND [ID] > #MaxID
ORDER BY [ID])
UPDATE upd
SET [CustomerTypeId] = 1
,#MaxID = CASE WHEN [ID] > #MaxID THEN [ID] ELSE #MaxID END
SET #rowcount = ##ROWCOUNT
WAITFOR DELAY '00:00:01'
END;
ALTER TABLE [dbo].[Customer] ALTER COLUMN [CustomerTypeId] TINYINT NOT NULL;
GO
ALTER TABLE [dbo].[Customer] ADD [CustomerTypeId] TINYINT NULL changes only the metadata (Sch-M locks) and lock time does not depend on the number of rows in a table
After that, I fill a new column by default value in small portions (5000 rows). I wait one second after each cycle so as not to block the table too aggressively. I have a int column "ID" as the primary clustered key
Finally, when all the new column is filled I change it to NOT NULL
No one can tell how much time will the operation cost as this depend on many ither factors after all.
You should not be worried about the operations itself because the SQL Server is doing everything right:
The Database Engine uses schema modification (Sch-M) locks during a
table data definition language (DDL) operation, such as adding a
column or dropping a table. During the time that it is held, the Sch-M
lock prevents concurrent access to the table. This means the Sch-M
lock blocks all outside operations until the lock is released.
I have never done ALTER operation on such amount of data and the only advice that I can give is to do it when there are not so many connections to the database (during the night).
EDIT:
Here you can found more information about your question. Generally, Matt Whitfield is right and
The only time that adding a column to a table results in a
size-of-data operation (i.e. an operation that modifies every row in a
table) is when the new column has a non-null default.
and when
New column is nullable, with a NULL default. The table's metadata
records the fact that the new column exists but may not be in the
record. This is why the null bitmap also has a count of the number of
columns in that particular record. SQL Server can work out whether a
column is present in the record or not. So – this is NOT a
size-of-data operation – the existing table records are not updated
when the new column is added. The records will be updated only when
they are updated for some other operation.
There is one way that I usually do - Export that table and create new column at local and re-name the table name, then import table table, and just re-name the existing table and convert the first table name as it wa original.

Auto Increment feature of SQL Server

I have created a table named as ABC. It has three columns which are as follows:-
The column number_pk (int) is the primary key of my table in which I have made the auto increment feature on for that column.
Now I have deleted two rows from that table say Number_pk= 5 and Number_pk =6.
The table which I get now is like this:-
Now if I again enter two new rows in this table with the same value I get the two new Number_pk starting from 7 and 8 i.e,
My question is that what is the logic behind this since I have deleted the two rows from the table. I know that a simple answer is because I have set the auto increment on for the primary key of my table. But I want to know is there any way that I can insert the two new entries starting from the last Number_pk without changing the design of my table?
And how the SQL Server manage this record since I have deleted the rows from the database??
The logic is guaranteeing that the generated numbers are unique. An ID field does not neccessarily have to have a meaning, but rather is most often used to identify a unique record, thus making it easier to perform operations on it.
If your database is designed properly, the deleted ID numbers would not have been possible to delete if they were referenced by any other tables in a foreign key relationship, thus preventing records from being orphaned in that way.
If you absolutely want to have entries sequences, you could consider issuing a RESEED, but as suggested, it would not really give you much advantages.
The identity record is "managed" because SQL Server will keep track of which numbers have been issued, regardless of whether they are still present or not.
Should you ever want to delete all records from a table, there are two ways to do so (provided no foreign key relatsons exist):
DELETE FROM Table
DELETE just removes the records, but the next INSERTED value will continue where the ID numbering left of.
TRUNCATE TABLE
TRUNCATE will actually RESEED the table, thus guaranteeing it starts again at the value you originally specified (most likely 1).
Although you should not do this until their is a specific requirement.
1.) Get the max id:
Declare #id int
Select #id = Max(Number_pk) From ABC
SET #id = #id + 1;
2.) And reset the Identity Column:
DBCC CHECKIDENT('ABC', RESEED, #id)
DBCC CHECKIDENT (Transact-SQL)

Generate unique ID to share with multiple tables SQL 2008

I have a couple of tables in a SQL 2008 server that I need to generate unique ID's for. I have looked at the "identity" column but the ID's really need to be unique and shared between all the tables.
So if I have say (5) five tables of the flavour "asset infrastructure" and I want to run with a unique ID between them as a combined group, I need some sort of generator that looks at all (5) five tables and issues the next ID which is not duplicated in any of those (5) five tales.
I know this could be done with some sort of stored procedure but I'm not sure how to go about it. Any ideas?
The simplest solution is to set your identity seeds and increment on each table so they never overlap.
Table 1: Seed 1, Increment 5
Table 2: Seed 2, Increment 5
Table 3: Seed 3, Increment 5
Table 4: Seed 4, Increment 5
Table 5: Seed 5, Increment 5
The identity column mod 5 will tell you which table the record is in. You will use up your identity space five times faster so make sure the datatype is big enough.
Why not use a GUID?
You could let them each have an identity that seeds from numbers far enough apart never to collide.
GUIDs would work but they're butt-ugly, and non-sequential if that's significant.
Another common technique is to have a single-column table with an identity that dispenses the next value each time you insert a record. If you need them pulling from a common sequence, it's not unlikely to be useful to have a second column indicating which table it was dispensed to.
You realize there are logical design issues with this, right?
Reading into the design a bit, it sounds like what you really need is a single table called "Asset" with an identity column, and then either:
a) 5 additional tables for the subtypes of assets, each with a foreign key to the primary key on Asset; or
b) 5 views on Asset that each select a subset of the rows and then appear (to users) like the 5 original tables you have now.
If the columns on the tables are all the same, (b) is the better choice; if they're all different, (a) is the better choice. This is a classic DB spin on the supertype / subtype relationship.
Alternately, you could do what you're talking about and recreate the IDENTITY functionality yourself with a stored proc that wraps INSERT access on all 5 tables. Note that you'll have to put a TRANSACTION around it if you want guarantees of uniqueness, and if this is a popular table, that might make it a performance bottleneck. If that's not a concern, a proc like that might take the form:
CREATE PROCEDURE InsertAsset_Table1 (
BEGIN TRANSACTION
-- SELECT MIN INTEGER NOT ALREADY USED IN ANY OF THE FIVE TABLES
-- INSERT INTO Table1 WITH THAT ID
COMMIT TRANSACTION -- or roll back on error, etc.
)
Again, SQL is highly optimized for helping you out if you choose the patterns I mention above, and NOT optimized for this kind of thing (there's overhead with creating the transaction AND you'll be issuing shared locks on all 5 tables while this process is going on). Compare that with using the PK / FK method above, where SQL Server knows exactly how to do it without locks, or the view method, where you're only inserting into 1 table.
I found this when searching on google. I am facing a simillar problem for the first time. I had the idea to have a dedicated ID table specifically to generate the IDs but I was unsure if it was something that was considered OK design. So I just wanted to say THANKS for confirmation.. it looks like it is an adequate sollution although not ideal.
I have a very simple solution. It should be good for cases when the number of tables is small:
create table T1(ID int primary key identity(1,2), rownum varchar(64))
create table T2(ID int primary key identity(2,2), rownum varchar(64))
insert into T1(rownum) values('row 1')
insert into T1(rownum) values('row 2')
insert into T1(rownum) values('row 3')
insert into T2(rownum) values('row 1')
insert into T2(rownum) values('row 2')
insert into T2(rownum) values('row 3')
select * from T1
select * from T2
drop table T1
drop table T2
This is a common problem for example when using a table of people (called PERSON singular please) and each person is categorized, for example Doctors, Patients, Employees, Nurse etc.
It makes a lot of sense to create a table for each of these people that contains thier specific category information like an employees start date and salary and a Nurses qualifications and number.
A Patient for example, may have many nurses and doctors that work on him so a many to many table that links Patient to other people in the PERSON table facilitates this nicely. In this table there should be some description of the realtionship between these people which leads us back to the categories for people.
Since a Doctor and a Patient could create the same Primary Key ID in their own tables, it becomes very useful to have a Globally unique ID or Object ID.
A good way to do this as suggested, is to have a table designated to Auto Increment the primary key. Perform an Insert on that Table first to obtain the OID, then use it for the new PERSON.
I like to go a step further. When things get ugly (some new developer gets got his hands on the database, or even worse, a really old developer, then its very useful to add more meaning to the OID.
Usually this is done programatically, not with the database engine, but if you use a BIG INT for all the Primary Key ID's then you have lots of room to prefix a number with visually identifiable sequence. For example all Doctors ID's could begin with 100, all patients with 110, all Nurses with 120.
To that I would append say a Julian date or a Unix date+time, and finally append the Auto Increment ID.
This would result in numbers like:
110,2455892,00000001
120,2455892,00000002
100,2455892,00000003
since the Julian date 100yrs from now is only 2492087, you can see that 7 digits will adequately store this value.
A BIGINT is 64-bit (8 byte) signed integer with a range of -9.22x10^18 to 9.22x10^18 ( -2^63 to 2^63 -1). Notice the exponant is 18. That's 18 digits you have to work with.
Using this design, you are limited to 100 million OID's, 999 categories of people and dates up to... well past the shelf life of your databse, but I suspect thats good enough for most solutions.
The operations required to created an OID like this are all Multiplication and Division which avoids all the gear grinding of text manipulation.
The disadvantage is that INSERTs require more than a simple TSQL statement, but the advantage is that when you are tracking down errant data or even being clever in your queries, your OID is visually telling you alot more than a random number or worse, an eyesore like GUID.