Azure SQL Server identity key values jumping up by 1,000

Azure SQL Server identity key values jumping up by 1,000 - azure-sql-database

I am using the SQL Server 2012 that MSOFT provide for Azure. My identity columns have a habit of jumping up by 1,000 sometimes even though they are "INT IDENTITY (1, 1) NOT NULL" in my table.
Is there anything I can do to stop this happening. What about if I remove all rows from a table? Seems like even after I delete every row then when I add a new row it starts off with an ID that's more than 1,000.

Refer to this post and this answer. Basically, this is by design and the argument as to why this is not too much of an issue is that azure database limits will be exceeded before hitting the identity limit. There is also the option of using a bigint.
There is no explanation as to why the jump in seed is done when the database is bounced, but I am guessing it has something to do with concurrency problems that might result in the same identity being used for two records at the boundary between shutting down and restarting (for some reason I can't think of).

Related

Cockroach oddly auto incrementing on PK id at server but not local (knex.js seeding cockroach public cloud hosted db)

I can't for the life of me figure out why this is happening. Each time I run my table delete / create migration then seeding logic on the server DB it will give me bizarre 18 digit ids instead of incrementing 1, 2, 3. Locally all works fine, my db is a free tier hosted Cockroach database. I am seeding 3 records here are example ids that are generated (725368665646530561,725368665646694401,725368665646727169)
EDIT:
Based on a comment and some extra research I found that Cockroach DB, although is Postgres compatible, is not truly a Postgres DB. I also didn't realize how disliked and non-performant the AUTO_INCREMENT approach is. I ended up using an extension to generate UUIDS as the PK and then I query the newly created data and grab those if I need some FK relationships in another seed.
t.uuid('id').primary().notNullable().defaultTo(knex.raw('uuid_generate_v4()'));

The answer you linked to is a little old (from 2017). CockroachDB can generate auto-incrementing IDs, but there is a performance cost. (1) Having a primary index on sequential data is worse for performance, and (2) extra coordination between nodes is required to generate incrementing values.
If those performance tradeoffs are fine for you, then you can use the serial_normalization=sql_sequence_cached setting to get what you want. See https://www.cockroachlabs.com/docs/stable/serial.html
Also, v21.2 supports identity columns, which are a different syntax to get something similar: https://www.cockroachlabs.com/docs/stable/create-table.html#identity-columns

SQL Server 2012 Identity column after a restore

I was just doing some test on my SQL Server 2012 to practice backup and restores.
I created a simple table with few column, one was ID (int) marked as an identity column.
I added 3 records, the ID was automatically set to 1 then 2 then 3... ok
I back up the database, then did a full restore.
After the restore I tried adding a new record, but strange, the new ID was 1003 (instead of 4) and adding a new record again the ID was 1004 (so ok, it's again increasing from the previous one).
So I tried again to backup the database, and then restore it.
Tried to add a new record and guess... the new ID was 2004 (instead of 1005) and then the record after 2005, 2006 and so on.
It seem that after every restore the identity column is somehow messed up.
Is this a normal behavior?

Yes, this is normal behavior.
The IDENTITY will reserve a certain size chunk of values and cache those. If your server goes down, or you do a backup/restore, the cached values that haven't been used yet from that reserved chunk are lost and won't come back.
And this is NOT a problem! The IDENTITY system just makes sure it gives you properly, ascending values - it never guaranteed "no gaps" - and gaps are NOT a problem, really!
See Aaron Bertrand's blog post Bad Habits To Kick: Making Assumptions about IDENTITY for even more insights into IDENTITY and what to except (and what not to expect) from IDENTITY

SQL SERVER Identity changes daily

I have a table which has an Identity column requestID. I have set autoincrement and seed values, both to 1.
The requestId increases as expected. But It shows one unexpected behavior.
eg
if today's last increment is 500, next day's identity starts at 1001.
How to solve this problem?

In SQL 2012 identity columns internally use sequences with CACHE for generating the values. While this improves performance, it has a side effect: an unexpected shutdown causes gaps in the sequence. There's nothing that can be done to prevent this. The official Microsoft guidance is that the application should not rely on the identity value to be sequential. 'Unexpected' shutdown includes, a power loss, Windows restart, and stopping SQL Server instance using NET STOP.

Why is this a problem? Most likely there are 500 attempts to insert something and they all fail. It is not a must (and not the intention of the autoinc-column) to store consecutive numbers. It's intention is to provide unique numbers where each new number is greater than the previous thus giving a sort order on the insert sequence.

You could run the following command daily:
DBCC CHECKIDENT ("dbo.MyTable", RESEED, 0);
I assume that you're trying to keep track of the number of Requests by day, but you could see counting problems if any inserts fail i.e. there might be gaps in your numbers.
IMO it would be better to just not reset the field each day and store the insertion time into a column for the new record. Then you can query by date and get a count.

Reinserting rows with identity columns

I'm implementing a queue in SQL Server (2008 R2) containing jobs that are to be performed. Upon completion, the job is moved to a history table, setting a flag to success or failure. The items in the queue table has an identity column as a primary key. The history queue has a combo of this id and a time stamp as a PK.
If a job fails, I would like the option to re-run it, and they way this is thought, is to move it back from the history table and back in to the live queue. For traceability purposes, I would like to have the reinserted row have the same ID as the original entry, which causes problems as this is an identity column.
I see two possible solutions:
1) Use IDENTITY_INSERT:
SET IDENTITY_INSERT TableName ON
-- Move from history to live queue
SET IDENTITY_INSERT TableName OFF
2) Create some custom logic to generate unique IDs, like getting the max ID value from both the live and history queue and adding one.
I don't see any real problems with 2 apart from it being messy, possibly poor performance and that it makes my neurotic skin crawl...
Option 1 I like, but I don't know the implications well enough. How will this perform? And I know that doing this to two tables at the same time will make things crash and burn. What happens if two threads does this to the same table at the same time?
Is this at all a good way to do this for semi-commonly used stored procedures, or should this technique just be used for batch inserting data once in a blue moon?
Any thoughts on which is the best option, or is there a better way?

I'd go with Option 1 - Use IDENTITY_INSERT
SET IDENTITY_INSERT TableName ON
-- Move from history to live queue
SET IDENTITY_INSERT TableName OFF
IDENTITY_INSERT is a setting that applies to the current connection - so if another connection is doing similar, it will have no impact. The only place you get an error with using it is if you attempt to set it ON on another table without first turning it OFF on the first table.

Can't you use the original (live) identity value to insert into the history table? You say you combine it with a timestamp anyway.

Assuming that the Queue's Identity column is the one assigning "Job IDs", I would think the simplest solution would be to add a new "OriginalJobID" nullable column, potentially with FK pointing to the history table. Then when you are rerunning a job, allow it to get a new ID as it is added to the queue, but have it keep a reference to the original job in this new column.
To answer "or should this technique just be used for batch inserting data once in a blue moon", I would say yes, definitely, that's exactly what it's for.
Oops, #Damien_The_Unbeliever is right, I'd forgotten that the IDENTITY_INSERT setting is per connection. It would be complicated to get yourself into real trouble with the identity insert approach (would take something like MARS I guess, or bad error-handling). Nonetheless, I think trying to reuse IDs is a mistake!

I can see a potential performance issue when reusing identity values and that is if the identity column is indexed by a clustered index.
A strict growing number will cause inserted rows to always be added last in the clustered index and no page splits will occur.
If you start to insert reused numbers then you may cause page splits during those insertions.
If that is a problem is up to your domain.

SQL Identity Column out of step

We have a set of databases that have a table defined with an Identity column as the primary key. As a sub-set of these are replicated to other servers, a seed system was created so that they could never clash. That system was by using a starting seed with an increment of 50.
In this way the table on DB1 would generate 30001, 30051 etc, where Database2 would generate 30002, 30052 and so on.
I am looking at adding another database into this system (it is split for scaling/loading purposes) and have discovered that the identites have got out of sync on one or two of the databases - i.e. database 3 that should have numbers ending in 3, doesn't anymore. The seeding and increments is still correct according to the table design.
I am obviously going to have to work around this problem somehow (probably by setting a high initial value), but can anyone tell me what would cause them to get out of sync like this? From a query on the DB I can see the sequence went as follows: 32403,32453, 32456, 32474, 32524, 32574 and has continued in increments of 50 ever since it went wrong.
As far as I am aware no bulk-inserts or DTS or anything like that has put new data into these tables.
Second (bonus) question - how to reset the identity so that it goes back to what I want it to actually be!
EDIT:
I know the design is in principle a bit ropey - I didn't ask for criticism of it, I just wondered how it could have got out of sync. I inherited this system and changing the column to a GUID - whilst undoubtedly the best theoretical solution - is probably not going to happen. The system evolved from a single DB to multiple DBs when the load got too large (a few hundred GBs currently). Each ID in this table will be referenced in many other places - sometimes a few hundred thousand times each (multiplied by about 40,000 for each item). Updating all those will not be happening ;-)

Replication = GUID column.
To set the value of the next ID to be 1000:
DBCC CHECKIDENT (orders, RESEED, 999)

If you want to actually use Primary Keys for some meaningful purpose other than uniquely identify a row in a table, then it's not an Identity Column, and you need to assign them some other explicit way.
If you want to merge rows from multiple tables, then you are violating the intent of Identity, which is for one table. (A GUID column will use values that are unique enough to solve this problem. But you still can't impute a meaningful purpose to them.)

Perhaps somebody used:
SET IDENTITY INSERT {tablename} ON
INSERT INTO {tablename} (ID, ...)
VALUES(32456, ....)
SET IDENTITY INSERT {tablename} OFF
Or perhaps they used DBCC CHECKIDENT to change the identity. In any case, you can use the same to set it back.

It's too risky to rely on this kind of identity strategy, since it's (obviously) possible that it will get out of synch and wreck everything.
With replication, you really need to identify your data with GUIDs. It will probably be easier for you to migrate your data to a schema that uses GUIDs for PKs than to try and hack your way around IDENTITY issues.

To address your question directly,
Why did it get out of sync may be interesting to discuss, but the only result you could draw from the answer would be to prevent it in the future; which is a bad course of action. You will continue to have these and bigger problems unless you deal with the design which has a fatal flaw.
How to set the existing values right is also (IMHO) an invalid question, because you need to do something other than set the values right - it won't solve your problem.
This isn't to disparage you, it's to help you the best way I can think of. Changing the design is less work both short term and long term. Not to change the design is the pathway to FAIL.

This doesn't really answer your core question, but one possibility to address the design would be to switch to a hi_lo algorithm. it wouldn't require changing the column away from an int. so it shouldn't be nearly as much work as changing to a guid.
Hi_lo is used by the nhibernate ORM, but I couldn't find much documentation on it.
Basically the way a Hi_lo works is you have 1 central place where you keep track of your hi value. 1 table in 1 of the databases that every instance of your insert application can see. then you need to have some kind of a service (object, web service, whatever) that has a life somewhat longer than a single entity insert. this service when it starts up will go to the hi table, grab the current value, then increment the value in that table. Use a read committed lock to do this so that you won't get any concurrency issues with other instances of the service. Now you would use the new service to get your next id value. It internally starts at the number it got from the db, and when it passes that value out, increments by 1. keeping track of this current value and the "range" it's allowed to pass out. A simplistic example would be this.
service 1 gets 100 from "hi_value" table in db. increments db value 200.
service 1 gets request for a new ID. passes out 100.
another instance of the service, service 2 (either another thread, another middle tier worker machine, etc) spins up, gets 200 from the db, increments db to 300.
service 2 gets a request for a new id. passes out 200.
service 1 gets a request for a new id. passes out 101.
if any of these ever gets to passing out more than 100 before dying, then they will go back to the db, and get the current value and increment it and start over. Obviously there's some art to this. How big should your range be, etc.
A very simple variation on this is a single table in one of your db's that just contains the "nextId" value. basically manually reproducing oracle's sequence concept.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas