EF performing a select on every insert with Identity column - sql

I have noticed that when I insert with EF, it performs a select to find the next PK.
I have a PK field with Identity set and auto-increment enabled.
Here is the Query
SELECT [ackId]
FROM [dbo].[Acks]
WHERE ##ROWCOUNT > 0 AND [ackId] = scope_identity()
I happened to notice it as it was at the top of the Recent Expensive Queries List in SQL Manger Studio. It doesn't quite make sense that the query to find the PK is more expensive than the actual insert?
Is this normal behaviour? Or is this behaviour causef by entity framework?
Another issue I can think of. If EF is doing a select to get the value, what happens if there are several connections writing to the db? Can there not be a case when the select returns the same value?

Yes it's a normal behavior, when inserting a new entity with identity key.
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
which is a default convention for numeric and guid
Code First infers that a property is a primary key if a property on a
class is named “ID” (not case sensitive), or the class name followed
by "ID". If the type of the primary key property is numeric or GUID it
will be configured as an identity column. - MSDN
EF will update the temporary key with the inserted key by selecting the last identity value.
The Entity Framework replaces the value of the property in a temporary
key with the identity value that is generated by the data source after
SaveChanges is called. - MSDN
And selecting an scope_identity will return the last identity value of the inserted entity which will be a new increment value.
If you don't want to select the identity value every time you insert a new entity, you can disable the identity option or using fluent api.
[DatabaseGenerated(DatabaseGeneratedOption.None)]
And If you insert a lot of records and don't want EF to reselect the identity key you can write a normal ADO.NET sql query or you can also try using Bulk Insert.

This is a common pattern found in every ORM that supports database-generated identity keys. Identity is a key concept of entities. For example, two clients with the same name are still two distinct clients. A surrogate key like ClientId is the only way to tell them apart.
An ORM needs to know this surrogate key value in the database and the only way to get it unambiguously when inserting data is by querying scope_identity() directly.
This never causes race conditions, because an identity column is always incremented when an insert happens (it never rolls back) and scope_identity() always returns the identity value that's generated within the scope of the INSERT statement.
The only way to get rid of this expensive pattern is to generate key values in code and set the primary key property to DatabaseGeneratedOption.None. But generating and inserting unique primary key values without concurrency problems is not trivial.
I guess it's something you have to live with. ORMs were never meant to do bulk inserts, there are other ways to do these.

Related

Confusing t-sql exam answer about sequence or uniqueidentifier

I found a t-sql question and its answer. It is too confusing. I could use a little help.
The question is:
You develop a database application. You create four tables. Each table stores different categories of products. You create a Primary Key field on each table.
You need to ensure that the following requirements are met:
The fields must use the minimum amount of space.
The fields must be an incrementing series of values.
The values must be unique among the four tables.
What should you do?
A. Create a ROWVERSION column.
B. Create a SEQUENCE object that uses the INTEGER data type.
C. Use the INTEGER data type along with IDENTITY
D. Use the UNIQUEIDENTIFIER data type along with NEWSEQUENTIALID()
E. Create a TIMESTAMP column.
The said answer is D. But, I think the more suitable answer is B. Because sequence will use less space than GUID and it satisfies all the requirements.
D is a wrong answer, because NEWSEQUENTIALID doesn't guarantee "an incrementing series of values" (second requirement).
NEWSEQUENTIALID()
Creates a GUID that is greater than any GUID
previously generated by this function on a specified computer since
Windows was started. After restarting Windows, the GUID can start
again from a lower range, but is still globally unique.
I'd say that B (sequence) is the correct answer. At least, you can use a sequence to fulfil all three requirements, if you don't restart/recycle it manually. I think it is the easiest way to meet all three requirements.
Between the choices provided D B is the correct answer, since it meets all requirements:
ROWVERSION is a bad choice for a primary key, as stated in MSDN:
Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column. This property makes a rowversion column a poor candidate for keys, especially primary keys. Any update made to the row changes the rowversion value and, therefore, changes the key value. If the column is in a primary key, the old key value is no longer valid, and foreign keys referencing the old value are no longer valid.
TIMESTAMP is deprecated, as stated in that same page:
The timestamp syntax is deprecated. This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
An IDENTITY column does not guarantee uniqueness, unless all it's values are only ever generated automatically (you can use SET IDENTITY_INSERT to insert values manually), nor does it guarantee uniqueness between tables for any value.
A GUID is practically guaranteed to be unique per system, so if a guid is the primary key for all 4 tables it ensures uniqueness for all tables. the one requirement it doesn't fulfill is storage size - It's storage size is quadruple that of int (16 bytes instead of 4).
A SEQUENCE, when is not declared as recycle, guarantee uniqueness, and has the lowest storage size.
The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted.
However,
I would actually probably choose a different option all together - create a base table with a single identity column and link it with a 1:1 relationship with all other categories. then use an instead of insert trigger for all categories tables that will first insert a record to the base table and then use scope_identity() to get the value and insert it as the primary key for the category table.
This will enforce uniqueness as well as make it possible to use a single foreign key reference between the categories and products.
The issue has been discussed extensively in the past, in general:
http://blog.codinghorror.com/primary-keys-ids-versus-guids/
The constraint #3 is why a SEQUENCE could run into issues as there is a higher risk of collision/lowered number of possible rows in each table.

Best approach for multi-tenant primary keys

I have a database used by several clients. I don't really want surrogate incremental key values to bleed between clients. I want the numbering to start from 1 and be client specific.
I'll use a two-part composite key of the tenant_id as well as an incremental id.
What is the best way to create an incremental key per tenant?
I am using SQL Server Azure. I'm concerned about locking tables, duplicate keys, etc. I'd typically set the primary key to IDENTITY and move on.
Thanks
Are you planning on using SQL Azure Federations in the future? If so, the current version of SQL Azure Federations does not support the use of IDENTITY as part of a clustered index. See this What alternatives exist to using guid as clustered index on tables in SQL Azure (Federations) for more details.
If you haven't looked at Federations yet, you might want to check it out as it provides an interesting way to both shard the database and for tenant isolation within the database.
Depending upon your end goal, using Federations you might be able to use a GUID as the primary clustered index on the table and also use an incremental INT IDENTITY field on the table. This INT IDENTITY field could be shown to end-users. If you are federating on the TenantID each "Tenant table" effectively becomes a silo (as I understand it at least) so the use of IDENTITY on a field within that table would effectively be an ever increasing auto generated value which increments within a given Tenant.
When \ if data is merged together (combining data from multiple Tenants) you would wind up with collisions on this INT IDENTITY field (hence why IDENTITY isn't supported as a primary key in federations) but as long as you aren't using this field as a unique identifier within the system at large you should be ok.
If you're looking to duplicate the convenience of having an automatically assigned unique INT key upon insert, you could add an INSTEAD OF INSERT trigger that uses MAX of the existing column +1 to determine the next value.
If the column with the identity value is the first key in an index, the MAX query will be a simple index seek, very efficient.
Transactions will ensure that unique values are assigned but this approach will have different locking semantics than the standard identity column. IIRC, SQL Server can allocate a different identity value for each transaction that requests it in parallel and if a transaction is rolled back, the value(s) allocated to it are discarded. The MAX approach would only allow one transaction to insert rows into the table at a time.
A related approach could be to have a dedicated key value table keyed by the table name, tenant ID and current identity value. It would require the same INSTEAD OF INSERT trigger and more boilerplate to query and keep that key table updated. It wouldn't improve parallel operations though; the lock would just be on a different table's record.
One possibility to fix the locking bottleneck would be to include the current SPID in the key's value (now the identity key is a combination of sequential int and whatever SPID happened to allocate it and not simply sequential), use the dedicated identity value table and insert records there per SPID as necessary; the identity table PK would be (table name, tenant, SPID) and have a non-key column with the current sequential value. That way, each SPID would have its own dynamically allocated identity pool and would only ever have its own SPID specific records locked.
Another downside is maintaining triggers that have to be updated whenever you change the columns in any of the special identity tables.

TSQL Auto Increment on Update

SQL Server 2008+
I have a table with an auto-increment column which I would like to have increment not only on insert but also update. This column is not the primary key, but there is also a primary key which is a GUID created automatically via newid().
As far as I can tell, there are two ways to do this.
1.) Delete the existing row and insert a new row with indentical values (plus any updates).
or
2.) Update the existing row and use the following to get the "next" identity value:
IDENT_CURRENT('myTable') + IDENT_INCR('myTable')
In either case, I'm forced to allow identity inserts. (With option 1, because the primary key for the table needs to remain the same, and with option 2 because I'm updating the auto-increment column with a specific value.) I'm not sure what the locking/performance consequences of this are.
Any thoughts on this? Is there a better approach? The goal here is to maintain an always increasing set of integer values in the column whenever a row is inserted or updated.
I think a column of type rowversion (formerly known as "timestamp") might be your simplest choice, although at 8 bytes these can amount to fairly large integers. The "timestamp" syntax is deprecated in favor of rowversion (since ISO SQL has a timestamp datatype).
If you stay with the Identity column approach, you would probably want to put your logic into an UPDATE trigger, which would effectively replace the UPDATE with the INSERT and DELETE combination you've described.
Note that Identity column values are not guaranteed to be sequential, only increasing.
Does it need to be an integer column? A timestamp column will provide you the functionality you are looking for out of the box.
Columns with an identity property can't be updated. Once the column with an identity property on it has been assigned a value, either automatically, or with identity_insert on, it is an invariant value. Further the identity property may not be disabled or removed via alter column.
I believe what you want to look at is a SQL Server TIMESTAMP (now called rowversion in SQL Server 2008). It is fundamentally an auto-incrementing binary value. Each database has a unique rowversion counter. Each row insert/update in a table with a timestamp/rowversion column results in the counter being ticked up and the new value assigned to the inserted/modified row.

How does hibernate populate ids of auto generated fields?

Say i have an entity with an auto generated primary key. Now if i try to save the entity with values of all other fields which may not be unique.
The entity gets auto populated with the id of the row got inserted. How did it get hold of that primary key value?
EDIT:
If the primary key column is say identity column whose value is totally decided by the database. So it does an insert statement without that column value and the db decides the value to use does it communicate back its decision (I dont think so)
Hibernate use three method for extracting the DB auto generated field depending on what is support by the jdbc driver or the dialect you are using.
Hibernate extract generated field value to put it back in the pojo :
Using the method Statement.getGeneratedKeys (Statement javadocs)
or
Inserting and selecting the generated field value directly from the insert statement. (Dialect Javadocs)
or
Executing a select statement after the insert to retrieve the generated IDENTITY value
All this is done internally by hibernate.
Hope it`s the explication you are looking for.
This section of the Hibernate documentation describes the auto generation of ids. Usually the AUTO generation strategy is used for maximum portability and assuming that you use Annotations to provide your domain metadata you can configure it as follows:
#Id
#GeneratedValue(strategy=GenerationType.AUTO)
private long id;
Anyway the supplied link should provide all the detail you need on generated ids.
When you create an object with the, say, sequence-derived surrogate primary key, you pass it to the Hibernate session with that field set to the value that Hibernate interprets as "not assigned", by default 0. This field is not populated with the assigned value until the corresponding record is not inserted into the database table. You can trigger insertion by either explicitly calling flush() on the hibernate session or performing a database read in the same session. After that you can check the value of that field and it will be assigned rather than 0.

Can you use auto-increment in MySql with out it being the primary Key

I am using GUIDs as my primary key for all my other tables, but I have a requirement that needs to have an incrementing number. I tried to create a field in the table with the auto increment but MySql complained that it needed to be the primary key.
My application uses MySql 5, nhibernate as the ORM.
Possible solutions I have thought of are:
change the primary key to the auto-increment field but still have the Id as a GUID so the rest of my app is consistent.
create a composite key with both the GUID and the auto-increment field.
My thoughts at the moment are leaning towards the composite key idea.
EDIT: The Row ID (Primary Key) is the GUID currently. I would like to add an an INT Field that is Auto Incremented so that it is human readable. I just didn't want to move away from current standard in the app of having GUID's as primary-keys.
A GUID value is intended to be unique across tables and even databases so, make the auto_increment column primary index and make a UNIQUE index for the GUID
I would lean the other way.
Why? Because creating a composite key gives the impression to the next guy who comes along that it's OK to have the same GUID in the table twice but with different sequence numbers.
A couple of thoughts:
If your GUID is auntoincremental and unique, why not let it be the actual Primary Key?
On the other hand, you should never take semantical decisions based on programmatic problems: you have a problem with MySQL, not with the design of your DB.
So, a couple of workarounds here:
Creating a trigger that would set the GUID to the proper value once it's inserted. That's a MySQL solution to a MySQL problem, without altering semantics for your schema.
Before inserting, start a transaction (make sure auto commit is set to false), find out the latest GUID, increment and insert with the new value. In other words, auto-increment not automatically :P
GUID's are not intended to be orderable, that's why AUTO_INCREMENT for them does not make sense.
You may, though, use an AUTO_INCREMENT for a second column of a composite primary key in MyISAM tables. You can create a composite key over (GUID, INT) column and make the second column to be AUTO_INCREMENT.
To generate a new GUID, just call UUID() in an INSERT statement or in a trigger.
No, only the primary key can have auto_increment as its value.
If, for some reason, you can't change the identity column to be a primary key, what about manually generating the auto-increment via some kind of SEQUENCE table plus a trigger to query the SEQUENCE table and save the next value to use. Then assign the value to the destination table in the trigger. Same effect. The only question I would have is whether the auto-incremented value is going to make it back thru NHibernate without a re-select of the table.