New IDENTITY column on existing table - what will be the order? - sql

I added an IDENTITY column to an existing table in SQL Server 2012. Expected the numbers to be generated in the order of clustered index (non-sequential GUID), but to my surprise they got generated in the order of one of the indexes which is not even unique, and also coincidentally exactly the order I wanted!
Can anyone explain this? Here are the details of the table:
id - guid (non-sequential), clustered index, primary key
eventdate - bigint (unix date), not null, non-unique index
more columns, some indexed, all indexes non-unique
Identity values got assigned in the order of eventdate. I even found a few examples where several rows had the same eventdate, and they always had sequential identity numbers.

MSDN says that the order in which identity values are generated for the new column of existing table is not defined.
IDENTITY
Specifies that the new column is an identity column. The
SQL Server Database Engine provides a unique, incremental value for
the column. When you add identifier columns to existing tables, the
identity numbers are added to the existing rows of the table with the
seed and increment values. The order in which the rows are updated
is not guaranteed. Identity numbers are also generated for any new
rows that are added.
So, you'd better check thoroughly that you got new IDENTITY values in the order that you need. Check all rows of the table.
Edit
"The order is not guaranteed" doesn't mean that it is random, it just means that optimizer is free to pick any method to scan the table. In your system it apparently picked that index on eventdate (maybe it has the least amount of pages), but on another hardware or another version of the server the choice may change and you should not rely on it. Any change to the table structure or indexes may change the choice as well. Most likely the optimizer's decision is deterministic (i.e. not random), but it is not disclosed in the docs and may depend on many internal things and may change at any time.
Your result is not unexpected. Identity values were assigned in some unspecified order, which coincided with the order of the index.

Related

IDENTITY SEED is incrementing based on other tables seed values

I'm using SQL SERVER 2017 and using SSMS. I have created a few tables whose Primary Key is int and enabled Is Identity and set Identity Increment = 1 and Identity Seed=1 For all the tables I have used the same method. But When I added one record in a table say Lead it's ID was 2, Then added value to the table say Followup then its ID was 3.
Here I'm adding the screenshots for a better understanding
Lead Table
Followup Table
Is there any option available to avoid this? can we keep the identity individual for each table?
The documentation is quite specific about what identity does not guarantee:
The identity property on a column does not guarantee the following:
Uniqueness of the value . . .
Consecutive values within a transaction . . .
Consecutive values after server restart or other failures . . .
Reuse of values
In general, the "uniqueness" property is a non-issue, because identity columns are usually the primary key (or routinely declared at least unique), which does guarantee uniqueness.
The purpose of an identity column is to provide a unique numeric identifier different from other rows, so it can be readily used as a primary key. There are no other guarantees. And for performance SQL Server has lots of short-cuts that result in gaps.
If you want no gaps, then the simplest way is to assign a value when querying:
row_number() over (order by <identity column>)
That is not 100% satisfying, because deletions can affect the value. I also think that on parallel systems, inserts can as well (because identities might be cached on individual nodes).
If you do not care about performance, you can use a sequence for assigning a value. This is less performant than using an identity. Basically, it requires serializing all the inserts to guarantee the properties of the insert.
I should note that even with a sequence, a failed insert can still produce gaps, so it still might not do what you want.

Insert a record at the particular row in a psql table [duplicate]

I am just wondering if I can maintain the order of insertion of data in SQL Server?
I am working on my own project and it is kind of blog site having a number of posts. I will save my posts to my SQL Server but I want them in the order of insertion.
Question: I understand that If I use auto-incrementing integer in SQL Server as the primary key, I can maintain the order of insertion. But I want to use "Guid" for primary key, instead of identity. Then "Guid" does not seem to maintain the order of insertion.
Should I use both auto-incrementing integer for order of insertion and Guid for identity?
Or Is there any other way to maintain the order of insertion with Guid set to primary key?
Related question: the reason why I want to maintain the order of insertion is that in that way, I don't have to use order by clause which causes extra sorting process.
But should not I trust the order of data returned from database without any ordering clauses like order by?
Should I always use some ordering clauses for ordering my data? Then what if I have a massive amount of data to order? Then how can I handle the situation?
Sorry for too many questions in a post but I believe they are all related.
You are misguided.
SQL tables represent unordered sets. If you want a result set in a particular order, then you need to use an ORDER BY clause in the query. The SQL optimizer might not use the ORDER BY, finding another way to return the results in order.
You can have an identity column that is not the primary key. But actually, you can have both an identity column and a guid column, with the former as the primary key and the latter as a unique key. Another solution is to have a CreatedAt datetime. You can use this for ordering . . . or even as a clustered index if you really wanted to.
When you specify ORDER BY in an INSERT...SELECT statement, SQL Server will assign IDENTITY values in the order specified. However, that does not mean rows are necessarily inserted in that order.
If you need rows returned in a specific order, you must use ORDER BY to guarantee ordering. Indexes can be leverage provide order data efficiently depending on the query particulars.

Confusing t-sql exam answer about sequence or uniqueidentifier

I found a t-sql question and its answer. It is too confusing. I could use a little help.
The question is:
You develop a database application. You create four tables. Each table stores different categories of products. You create a Primary Key field on each table.
You need to ensure that the following requirements are met:
The fields must use the minimum amount of space.
The fields must be an incrementing series of values.
The values must be unique among the four tables.
What should you do?
A. Create a ROWVERSION column.
B. Create a SEQUENCE object that uses the INTEGER data type.
C. Use the INTEGER data type along with IDENTITY
D. Use the UNIQUEIDENTIFIER data type along with NEWSEQUENTIALID()
E. Create a TIMESTAMP column.
The said answer is D. But, I think the more suitable answer is B. Because sequence will use less space than GUID and it satisfies all the requirements.
D is a wrong answer, because NEWSEQUENTIALID doesn't guarantee "an incrementing series of values" (second requirement).
NEWSEQUENTIALID()
Creates a GUID that is greater than any GUID
previously generated by this function on a specified computer since
Windows was started. After restarting Windows, the GUID can start
again from a lower range, but is still globally unique.
I'd say that B (sequence) is the correct answer. At least, you can use a sequence to fulfil all three requirements, if you don't restart/recycle it manually. I think it is the easiest way to meet all three requirements.
Between the choices provided D B is the correct answer, since it meets all requirements:
ROWVERSION is a bad choice for a primary key, as stated in MSDN:
Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column. This property makes a rowversion column a poor candidate for keys, especially primary keys. Any update made to the row changes the rowversion value and, therefore, changes the key value. If the column is in a primary key, the old key value is no longer valid, and foreign keys referencing the old value are no longer valid.
TIMESTAMP is deprecated, as stated in that same page:
The timestamp syntax is deprecated. This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
An IDENTITY column does not guarantee uniqueness, unless all it's values are only ever generated automatically (you can use SET IDENTITY_INSERT to insert values manually), nor does it guarantee uniqueness between tables for any value.
A GUID is practically guaranteed to be unique per system, so if a guid is the primary key for all 4 tables it ensures uniqueness for all tables. the one requirement it doesn't fulfill is storage size - It's storage size is quadruple that of int (16 bytes instead of 4).
A SEQUENCE, when is not declared as recycle, guarantee uniqueness, and has the lowest storage size.
The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted.
However,
I would actually probably choose a different option all together - create a base table with a single identity column and link it with a 1:1 relationship with all other categories. then use an instead of insert trigger for all categories tables that will first insert a record to the base table and then use scope_identity() to get the value and insert it as the primary key for the category table.
This will enforce uniqueness as well as make it possible to use a single foreign key reference between the categories and products.
The issue has been discussed extensively in the past, in general:
http://blog.codinghorror.com/primary-keys-ids-versus-guids/
The constraint #3 is why a SEQUENCE could run into issues as there is a higher risk of collision/lowered number of possible rows in each table.

Best approach for multi-tenant primary keys

I have a database used by several clients. I don't really want surrogate incremental key values to bleed between clients. I want the numbering to start from 1 and be client specific.
I'll use a two-part composite key of the tenant_id as well as an incremental id.
What is the best way to create an incremental key per tenant?
I am using SQL Server Azure. I'm concerned about locking tables, duplicate keys, etc. I'd typically set the primary key to IDENTITY and move on.
Thanks
Are you planning on using SQL Azure Federations in the future? If so, the current version of SQL Azure Federations does not support the use of IDENTITY as part of a clustered index. See this What alternatives exist to using guid as clustered index on tables in SQL Azure (Federations) for more details.
If you haven't looked at Federations yet, you might want to check it out as it provides an interesting way to both shard the database and for tenant isolation within the database.
Depending upon your end goal, using Federations you might be able to use a GUID as the primary clustered index on the table and also use an incremental INT IDENTITY field on the table. This INT IDENTITY field could be shown to end-users. If you are federating on the TenantID each "Tenant table" effectively becomes a silo (as I understand it at least) so the use of IDENTITY on a field within that table would effectively be an ever increasing auto generated value which increments within a given Tenant.
When \ if data is merged together (combining data from multiple Tenants) you would wind up with collisions on this INT IDENTITY field (hence why IDENTITY isn't supported as a primary key in federations) but as long as you aren't using this field as a unique identifier within the system at large you should be ok.
If you're looking to duplicate the convenience of having an automatically assigned unique INT key upon insert, you could add an INSTEAD OF INSERT trigger that uses MAX of the existing column +1 to determine the next value.
If the column with the identity value is the first key in an index, the MAX query will be a simple index seek, very efficient.
Transactions will ensure that unique values are assigned but this approach will have different locking semantics than the standard identity column. IIRC, SQL Server can allocate a different identity value for each transaction that requests it in parallel and if a transaction is rolled back, the value(s) allocated to it are discarded. The MAX approach would only allow one transaction to insert rows into the table at a time.
A related approach could be to have a dedicated key value table keyed by the table name, tenant ID and current identity value. It would require the same INSTEAD OF INSERT trigger and more boilerplate to query and keep that key table updated. It wouldn't improve parallel operations though; the lock would just be on a different table's record.
One possibility to fix the locking bottleneck would be to include the current SPID in the key's value (now the identity key is a combination of sequential int and whatever SPID happened to allocate it and not simply sequential), use the dedicated identity value table and insert records there per SPID as necessary; the identity table PK would be (table name, tenant, SPID) and have a non-key column with the current sequential value. That way, each SPID would have its own dynamically allocated identity pool and would only ever have its own SPID specific records locked.
Another downside is maintaining triggers that have to be updated whenever you change the columns in any of the special identity tables.

identity column in Sql server

Why does Sql server doesn't allow more than one IDENTITY column in a table?? Any specific reasons.
Why would you need it? SQL Server keeps track of a single value (current identity value) for each table with IDENTITY column so it can have just one identity column per table.
An Identity column is a column ( also known as a field ) in a database table that :-
Uniquely identifies every row in the table
Is made up of values generated by the database
This is much like an AutoNumber field in Microsoft Access or a sequence in Oracle.
An identity column differs from a primary key in that its values are managed by the server and ( except in rare cases ) can't be modified. In many cases an identity column is used as a primary key, however this is not always the case.
SQL server uses the identity column as the key value to refer to a particular row. So only a single identity column can be created. Also if no identity columns are explicitly stated, Sql server internally stores a separate column which contains key value for each row. As stated if you want more than one column to be having unique value, you can make use of UNIQUE keyword.
The SQL Server stores the identity in an internal table, using the id of the table as it's key. So it's impossible for the SQL Server to have more than one Identity column per table.
Because MS realized that better than 80% of users would only want one auto-increment column per table and the work-around to have a second (or more) is simple enough i.e. create an IDENTITY with seed = 1, increment = 1 then a calculated column multiplying the auto-generated value by a factor to change the increment and adding an offset to change the seed.
Yes , Sequences allow more than one identity like columns in atable , but there are some issues here . In a typical development scenario i have seen developers manually inserting valid values in a column (which is suppose to be inserted through sequence) . Later on when a sequence try inserting value in to the table , it may fail due to unique key violation.
Also , in a multi developer / multi vendor scenario, developers might use the same sequence for more than one table (as sequences are not linked to tables) . This might lead to missing values in one of the table . ie tableA might get the value 1 while tableB might use value 2 and tableA will get 3. This means that tableA will have 1 and 3 (missing 2).
Apart from this , there is another scenario where you have a table which is truncated every day . Since Sequences are not having any link with table , the truncated table will continue to use the Seq.NextVal again (unless you manually reset the sequence) leading to missing values or even more dangerous arthmetic overflow error after sometime.
Owing to above reason , i feel that both Oracle sequences and SQL server identity column are good for their purposes. I would prefer oracle implementing the concept of Identity column and SQL Server implementing the sequence concept so that developers can implement either of the two as per their requirement.
The whole purpose of an identity column is that it will contain a unique value for each row in the table. So why would you need more than one of them in any given table?
Perhaps you need to clarify your question, if you have a real need for more than one.
An identity column is used to uniquely identify a single row of a table. If you want other columns to be unique, you can create a UNIQUE index for each "identity" column that you may need.
I've always seen this as an arbitrary and bad limitation for SQL Server. Yes, you only want one identity column to actually identify a row, but there are valid reasons why you would want the database to auto-generate a number for more than one field in the database.
That's the nice thing about sequences in Oracle. They're not tied to a table. You can use several different sequences to populate as many fields as you like in the same table. You could also have more than one table share the same sequence, although that's probably a really bad decision. But the point is you could. It's more granular and gives you more flexibility.
The bad thing about sequences is that you have to write code to actually increment them, whether it's in your insert statement or in an on-insert trigger on the table. The nice thing about SQL Server identity is that all you have to do is change a property or add a keyword to your table creation and you're done.