Reuse the deleted row's identity value

Reuse the deleted row's identity value - sql

Is it possible to reuse an identity field value after deleting rows in SQL Server 2008 Express? Here is an example. Suppose I have a table with an Id field as a primary key (identity). If I add five rows, I will have these 5 Ids: 1, 2, 3, 4, 5. If I were to delete these rows, and then add five more, the new rows would have Ids: 6, 7, 8, 9, 10. Is it possible to let it start over at 1 again?
Do I have to delete data from another table in order to accomplish this? Thanks for your help.

Is it possible to reuse an identity field value after deleting rows in SQL Server 2008 Express?
Yes, but it's one of the worst decisions you could make.
This is so bad that I'm not gonna explain how to do it, but just why this is bad.
We use identity columns to generate values that are unrelated to the data the table holds.
If we use identities as primary keys and we reuse their value, the same key will hold 2 (or more) different entities throughout time, giving place to confusion specially when generating reports or having other tables that use this key and don't have referential integrity.
Solutions used to identify unused values (infamous gaps and islands problem) are very slow when the amount of rows increase. This makes performance drop as rows piles up.
You will have to apply this "unused number" logic every time you insert rows on the table, so it's another thing to keep in mind on maintenance tasks.
There is little to none benefit of reusing these numbers as INT or BIGINT max values are high enough to support most business data.
It's common to use identities as primary keys or with a clustered index because new records will be inserted at the end, reducing fragmentation. Once records are deleted, it generates gaps in between pages (the swiss cheese pages), which is solved by doing index rebuilds. The problem comes after the rebuilds, where you will be potentially inserting new records massively in between and cause tons of page splits, hitting performance unnecessarily.
I can't possible think of a real case when you actually need to fill empty gaps in an identity column, as the purpose of this value is to be unrelated to the entity the table represents.

If you are going to remove all the data inside the table you can use TRUNCATE TABLE, which will help you remove whole data in table and reset the identity to start from the SEED value.
TRUNCATE TABLE MyTable
You can also use DBCC CHECKIDENT to reseed your identity to the spesific value.
DBCC CHECKIDENT ('dbo.MyTable', RESEED, 1);
The following code will reed the identity to start from 1.
Finally, if you need to cover the gaps, you need to know that, using IDENTITY you are mot able to cover the gaps, and you need to implement your custom function to behave like IDENTITY and cover the gaps.

We can do this by two ways.
If it is Non Pk Column, Insert the record with same Identity number then delete the previous identity column value.
If you don't need owl data drop the table and recreate that table or truncate the table and use DBCC CHECKIDENT to reseed your identity to the spesific value.

Possible if you remove Autoincrement attribute from ID.
You can keep a check if the table is empty or not and INSERT values accordingly.

Related

Questions about increment operator

I have a table with a field that increments. The values for that field are 1, 2, 3...60 and then the field is 1060.
I don't know why?
The next value for id must be 61.

This is a new feature of SQL Server 2012. Identity columns use Sequences and values are cashed by default to speed up when inserting new rows. When unplanned shutdown of SQL Server occurs, cache is lost and Identity values continue with gap (of usually 1000 or 10000) created.

There could be several reasons - some of the most likely:
999 records were deleted
999 inserts were attempted and failed
Something reset the identity value using DBCC CHECKIDENT ({table}, RESEED, {value});
Something inserted a record with a specific ID using SET IDENTITY_INSERT ON
Autoincrement fields are not guaranteed to be consecutive. Thy are guaranteed to be unique. If you need them to be consecutive, then I would recommend keeping track of the next ID yourself, which will require you to think about concurrency, foreign key references, etc.

Am assuming you mean an IDENTITY field with an increment of 1.
This increases the value by 1 each time a new record is inserted.
To quote:
If an identity column exists for a table with frequent deletions, gaps
can occur between identity values. If this is a concern, do not use
the IDENTITY property. However, to ensure that no gaps have been
created or to fill an existing gap, evaluate the existing identity
values before explicitly entering one with SET IDENTITY_INSERT ON.
As alluded to in the comments it's likely this table once had data populated with all the missing values but these were deleted. (As this is quite a big chunk maybe this was done in bulk?)

SQL Server 2008 identity Key auto increment Issue

I recently had to move a database(sql server 2008) to a different server, and I have noticed that in one of the table, the value of identity column has started to get some unexpected values. its set as identity column with identity increment 1 and identity seed 1. After every 10 consecutive entry or so, it would start from another much higher number and increment by 1 for next 10 entries or so and then jump up to another higher number. I can't seem to figure out the issue.
Sorry for the layman language. I am not a DB person.

This is likely not an issue with your identity key but an issue with a framework being used to insert data, or a SP. If you have a stored procedure that inserts data but then is ROLLBACK'd, the ID was reserved but the row is 'deleted'.
So two places to check: One on the frameworks you're using (NHibernate, or Entity Framework, etc?)... those frameworks might be inserting rows then deleting them. Second place to check is INSERT statements in SPROCs and other places you might expect a ROLLBACK.
See: SQL Identity (autonumber) is Incremented Even with a Transaction Rollback
Another issue is that you may just be examining data without sorting it? And when you ported the data you assumed it would be inserted or retrieved always in-ID-order. But because the new table is not 'indexed' the same way you won't necessarily see items in primary-key order. This is less likely if rows appear sequential most of the time with gaps, but worth mentioning.

The following will shed some light on your issue.
Create a table with an identity auto increment column
Insert 10 rows of random data
You will see that the ID values are 1,2,3,4,5,6,7,8,9,10
delete from mytable where id=10
insert another row into the table
now you will see the ID values are 1,2,3,4,5,6,7,8,9,**11**

Sequence vs identity

SQL Server 2012 introduced Sequence as a new feature, same as in Oracle and Postgres. Where sequences are preferred over identities? And why do we need sequences?

I think you will find your answer here
Using the identity attribute for a column, you can easily generate
auto-incrementing numbers (which as often used as a primary key). With
Sequence, it will be a different object which you can attach to a
table column while inserting. Unlike identity, the next number for the
column value will be retrieved from memory rather than from the disk –
this makes Sequence significantly faster than Identity. We will see
this in coming examples.
And here:
Sequences: Sequences have been requested by the SQL Server community
for years, and it's included in this release. Sequence is a user
defined object that generates a sequence of a number. Here is an
example using Sequence.
and here as well:
A SQL Server sequence object generates sequence of numbers just like
an identity column in sql tables. But the advantage of sequence
numbers is the sequence number object is not limited with single sql
table.
and on msdn you can also read more about usage and why we need it (here):
A sequence is a user-defined schema-bound object that generates a
sequence of numeric values according to the specification with which
the sequence was created. The sequence of numeric values is generated
in an ascending or descending order at a defined interval and may
cycle (repeat) as requested. Sequences, unlike identity columns, are
not associated with tables. An application refers to a sequence object
to receive its next value. The relationship between sequences and
tables is controlled by the application. User applications can
reference a sequence object and coordinate the values keys across
multiple rows and tables.
A sequence is created independently of the tables by using the CREATE
SEQUENCE statement. Options enable you to control the increment,
maximum and minimum values, starting point, automatic restarting
capability, and caching to improve performance. For information about
the options, see CREATE SEQUENCE.
Unlike identity column values, which are generated when rows are
inserted, an application can obtain the next sequence number before
inserting the row by calling the NEXT VALUE FOR function. The sequence
number is allocated when NEXT VALUE FOR is called even if the number
is never inserted into a table. The NEXT VALUE FOR function can be
used as the default value for a column in a table definition. Use
sp_sequence_get_range to get a range of multiple sequence numbers at
once.
A sequence can be defined as any integer data type. If the data type
is not specified, a sequence defaults to bigint.

Sequence and identity both used to generate auto number but the major difference is Identity is a table dependant and Sequence is independent from table.
If you have a scenario where you need to maintain an auto number globally (in multiple tables), also you need to restart your interval after particular number and you need to cache it also for performance, here is the place where we need sequence and not identity.

Although sequences provide more flexibility than identity columns, I didn't find they had any performance benefits.
I found performance using identity was consistently 3x faster than using sequence for batch inserts.
I inserted approx 1.5M rows and performance was:
14 seconds for identity
45 seconds for sequence
I inserted the rows into a table which used sequence object via a table default:
NEXT VALUE for <seq> for <col_name>
and also tried specifying sequence value in select statement:
SELECT NEXT VALUE for <seq>, <other columns> from <table>
Both were the same factor slower than the identity method. I used the default cache option for the sequence.
The article referenced in Arion's first link shows performance for row-by-row insert and difference between identity and sequence was 16.6 seconds to 14.3 seconds for 10,000 inserts.
The Caching option has a big impact on performance, but identity is faster for higher volumes (+1M rows)
See this link for an indepth analysis as per utly4life's comment.

I know this is a little old, but wanted to add an observation that bit me.
I switched from identity to sequence to have my indexes in order. I later found out that sequence doesn't transfer with replication. I started getting key violations after I setup replication between two databases since the sequences were not in sync. just something to watch out for before you make a decision.

I find the best use of Sequences is not to replace an identity column but to create a "Order Number" type of field.
In other words, an Order Number is exposed to the end user and may have business rules along with it. You want it to be unique, but just using an Identity Column isn't really correct either.
For example, different order types might require a different sequence, so you might have a sequence for Internet Order, as opposed to In-house orders.
In other words, don't think of a Sequence as simple a replacement for identity, think of it as being useful in cases where an identity does not fit the business requirements.

Recently was bit by something to consider for identity vs sequence. Seems MSFT now suggests sequence if you may want to keep identity without gaps. We had an issue where there were huge gaps in the identity, but based on this statement highlighted would explain our issue that SQL cached the identity and after reboot we lost those numbers.
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-2017
Consecutive values after server restart or other failures – SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert. If gaps are not acceptable then the application should use its own mechanism to generate key values. Using a sequence generator with the NOCACHE option can limit the gaps to transactions that are never committed.

Automatically set new seed value on Identity

Similar Question: How to automatically reseed after using identity_insert?
I have a stored procedure that runs once a day, part of this procedure is to delete ~million rows from a table and re-insert a similar number of rows (there are millions of other rows that are not affected), once it finishes it rebuilds all the indexes on the table ready for use.
Performance wise, i have no problem, the only annoyance is the excessive size of the Primary Key - the primary key is an IDENTITY INT - obviously, deleting a million rows then re-inserting leaves me with a ~1000000 gap every day.
So, i want to reseed the ID column after the DELETE prior to the INSERT.
I know that i can use CHECKIDENT to reseed the identity column - but only if the new seed value is known before calling CHECKIDENT- i.e. using MAX() immediately prior to the CHECKIDENT.
My table is a very high traffic table, whilst it's unlikely, there is a small chance that other inserts / deletes may be attempted during this process, therefor i am a bit worried about resorting to MAX()+1 to determine my new seed value (i.e. a row being inserted from elsewhere after the MAX() was performed but prior to the completion of CHECKIDENT - wich would result in an error), eg:
DECLARE #NewSeed INT
SELECT #NewSeed = ISNULL(MAX(IdentityColumn), 0)
FROM tbl_Whatever
DBCC CHECKIDENT('tbl_Whatever', RESEED, #NewSeed)
My Question: is there a slicker way to perform a reseed based on the current max value of the Identity column? - I'm assuming the answer is no, but just thought i'd ask.
SQL Version is 2008 R2

Use a transaction to update the seed. This operation is so fast that you don't have to worry about locks.

As Dems said setting the Identity to Max + 1 does not eliminate gaps.
This is more complex but it does fill in gaps.
delete
reset identity (in a transaction)
determine if there are gaps
if there are gaps insert rows with SET IDENTITY_INSERT ON and fill in the gaps
Clearly the stored procedure needs to to determine the next gap value.
If you are inserting as many as you delete you don't even need to reset the identity.
I think you can even have other inserts hot and they will use the automatic identity
From the documentation http://msdn.microsoft.com/en-us/library/ms188059.aspx
At any time, only one table in a session can have the IDENTITY_INSERT property set to ON. If a table already has this property set to ON, and a SET IDENTITY_INSERT ON statement is issued for another table, SQL Server returns an error message that states SET IDENTITY_INSERT is already ON and reports the table it is set ON for.
If the value inserted is larger than the current identity value for the table, SQL Server automatically uses the new inserted value as the current identity value.

How do I add a column to large sql server table

I have a SQL Server table in production that has millions of rows, and it turns out that I need to add a column to it. Or, to be more accurate, I need to add a field to the entity that the table represents.
Syntactically this isn't a problem, and if the table didn't have so many rows and wasn't in production, this would be easy.
Really what I'm after is the course of action. There are plenty of websites out there with extremely large tables, and they must add fields from time to time. How do they do it without substantial downtime?
One thing I should add, I did not want the column to allow nulls, which would mean that I'd need to have a default value.
So I either need to figure out how to add a column with a default value in a timely manner, or I need to figure out a way to update the column at a later time and then set the column to not allow nulls.

ALTER TABLE table1 ADD
newcolumn int NULL
GO
should not take that long... What takes a long time is to insert columns in the middle of other columns... b/c then the engine needs to create a new table and copy the data to the new table.

I did not want the column to allow nulls, which would mean that I'd need to have a default value.
Adding a NOT NULL column with a DEFAULT Constraint to a table of any number of rows (even billions) became a lot easier starting in SQL Server 2012 (but only for Enterprise Edition) as they allowed it to be an Online operation (in most cases) where, for existing rows, the value will be read from meta-data and not actually stored in the row until the row is updated, or clustered index is rebuilt. Rather than paraphrase any more, here is the relevant section from the MSDN page for ALTER TABLE:
Adding NOT NULL Columns as an Online Operation
Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column with a default value is an online operation when the default value is a runtime constant. This means that the operation is completed almost instantaneously regardless of the number of rows in the table. This is because the existing rows in the table are not updated during the operation; instead, the default value is stored only in the metadata of the table and the value is looked up as needed in queries that access these rows. This behavior is automatic; no additional syntax is required to implement the online operation beyond the ADD COLUMN syntax. A runtime constant is an expression that produces the same value at runtime for each row in the table regardless of its determinism. For example, the constant expression "My temporary data", or the system function GETUTCDATETIME() are runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID() are not runtime constants because a unique value is produced for each row in the table. Adding a NOT NULL column with a default value that is not a runtime constant is always performed offline and an exclusive (SCH-M) lock is acquired for the duration of the operation.
While the existing rows reference the value stored in metadata, the default value is stored on the row for any new rows that are inserted and do not specify another value for the column. The default value stored in metadata is moved to an existing row when the row is updated (even if the actual column is not specified in the UPDATE statement), or if the table or clustered index is rebuilt.
Columns of type varchar(max), nvarchar(max), varbinary(max), xml, text, ntext, image, hierarchyid, geometry, geography, or CLR UDTS, cannot be added in an online operation. A column cannot be added online if doing so causes the maximum possible row size to exceed the 8,060 byte limit. The column is added as an offline operation in this case.

The only real solution for continuous uptime is redundancy.
I acknowledge #Nestor's answer that adding a new column shouldn't take long in SQL Server, but nevertheless, it could still be an outage that is not acceptable on a production system. An alternative is to make the change in a parallel system, and then once the operation is complete, swap the new for the old.
For example, if you need to add a column, you may create a copy of the table, then add the column to that copy, and then use sp_rename() to move the old table aside and the new table into place.
If you have referential integrity constraints pointing to this table, this can make the swap even more tricky. You probably have to drop the constraints briefly as you swap the tables.
For some kinds of complex upgrades, you could completely duplicate the database on a separate server host. Once that's ready, just swap the DNS entries for the two servers and voilà!
I supported a stock exchange company
in the 1990's who ran three duplicate
database servers at all times. That
way they could implement upgrades on
one server, while retaining one
production server and one failover
server. Their operations had a
standard procedure of rotating the
three machines through production,
failover, and maintenance roles every
day. When they needed to upgrade
hardware, software, or alter the
database schema, it took three days to
propagate the change through their
servers, but they could do it with no
interruption in service. All thanks
to redundancy.

"Add the column and then perform relatively small UPDATE batches to populate the column with a default value. That should prevent any noticeable slowdowns"
And after that you have to set the column to NOT NULL which will fire off in one big transaction. So everything will run really fast until you do that so you have probably gained very little really. I only know this from first hand experience.
You might want to rename the current table from X to Y. You can do this with this command sp_RENAME '[OldTableName]' , '[NewTableName]'.
Recreate the new table as X with the new column set to NOT NULL and then batch insert from Y to X and include a default value either in your insert for the new column or placing a default value on the new column when you recreate table X.
I have done this type of change on a table with hundreds of millions of rows. It still took over an hour, but it didn't blow out our trans log. When I tried to just change the column to NOT NULL with all the data in the table it took over 20 hours before I killed the process.
Have you tested just adding a column filling it with data and setting the column to NOT NULL?
So in the end I don't think there's a magic bullet.

select into a new table and rename. Example, Adding column i to table A:
select *, 1 as i
into A_tmp
from A_tbl
//Add any indexes here
exec sp_rename 'A_tbl', 'A_old'
exec sp_rename 'A_tmp', 'A_tbl'
Should be fast and won't touch your transaction log like inserting in batches might.
(I just did this today w/ a 70 million row table in < 2 min).
You can wrap it in a transaction if you need it to be an online operation (something might change in the table between the select into and the renames).

Another technique is to add the column to a new related table (Assume a one-to-one relationship which you can enforce by giving the FK a unique index). You can then populate this in batches and then you can add the join to this table wherever you want the data to appear. Note I would only consider this for a column that I would not want to use in every query on the original table or if the record width of my original table was getting too large or if I was adding several columns.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas