I'm working with a moderate sized SQL Server 2008 database (around 120 tables, backups are around 4GB compressed) where all the table primary keys are declared as simple int columns.
At present, primary key values are generated by NHibernate with the increment identity generator, which has worked well thus far, but precludes moving to a multiprocessing environment.
Load on the system is growing, so I'm evaluating the work required to allow the use of multiple servers accessing a common database backend.
Transitioning to the hi-lo generator seems to be the best way forward, but I can't find a lot of detail about how such a migration would work.
Will NHibernate automatically create rows in the hi-lo table for me, or do I need to script these manually?
If NHibernate does insert rows automatically, does it properly take account of existing key values?
If NHibernate does take care of thing automatically, that's great. If not, are there any tools to help?
Update
NHibernate's increment identifier generator works entirely in-memory. It's seeded by selecting the maximum value of used identifiers from the table, but from that point on allocates new values by a simple increment, without reference back to the underlying database table. If any other process adds rows to the table, you end up with primary key collisions. You can run multiple threads within the one process just fine, but you can't run multiple processes.
For comparison, the NHibernate identity generator works by configuring the database tables with identity columns, putting control over primary key generation in the hands of the database. This works well, but compromises the unit of work pattern.
The hi-lo algorithm sits inbetween these - generation of primary keys is coordinated through the database, allowing for multiprocessing, but actual allocation can occur entirely in memory, avoiding problems with the unit of work pattern.
To use the hi-lo generator you will need to create the lookup table that will store the next value for the "Hi" part of the generated keys. You have the choice of creating a separate column for each entity table, a single column that will be used by all entities, or a combination of the two options.
If a shared column is used then each generated key will only be used by a single entity. This may be preferable if there are many entity tables, but it reduces the total number of Ids that can be generated.
For example, our project uses a HiLoLookup table with three columns:
NextEntityId BIGINT NOT NULL,
NextAuthenticationLogId BIGINT NOT NULL,
NextConfigurationLogId BIGINT NOT NULL
The log tables have a high-volume of inserts, so have been given a separate pool of Hi values. The primary key columns of our regular entity tables use the 64-bit BIGINT data type so there's is no danger of overflowing even if there are large gaps in the sequence of ids. A shared pool of ids is used to reduce administration overhead.
The hi-lo generator doesn't have built-in support for initializing itself with starting values that don't conflict with existing keys - so this will need to be performed manually.
The value to use as the starting "hi" value depends on several considerations:
The maximum existing id value - the generated ids will need to be higher than this to avoid duplicates
How many ids should be generated before requesting a new "hi" value from the db (max_lo) - a bigger value improves concurrency but increases the potential for ids to be wasted, especially if the service is restarted frequently
The max_lo value that is provided in your entity mappings is critical when determining what your starting 'hi' values should be. For example, consider a table with a maximum existing id value of 12345. The number of ids that should be generated before going back to the database is 1000. In this case, the starting hi value should be (12345 / 1000) + 1 = 13, the first generated id will be 13000. Due to a quirk in the HiLoGenerator implementation, the max_lo value provided in the entity configuration needs to be 999, not 1000.
If using .hbm mappings:
<generator class="hilo">
<param name="table">dbo.HiLoLookup</param>
<param name="column">NextEntityId</param>
<param name="max_lo">999</param>
</generator>
Apart from the traditional HiLo you may also want to look into the new enhanced id generators. These can use a table (or sequence if the database support that) similar in spirit to the way HiLo works, but with builting support for separate number series for different entities (if you want). With the enhanced id generators you also have the option of using either the HiLo algoritm, or a pooled algorithm. The benefit of "pooled" is that the id generator table shows the actual value, not just a part of it.
These are new in NHibernate 3.3. The reference documentation doesn't mention them yet, but the Hibernate documentation do. They work the same in NHibernate.
I prefer using HILO as it does not break UOW and allows me to send multiple insert statements to the server.
Now for your questions:-
Will NHibernate automatically create rows in the hi-lo table for me, or do I need to script these manually?
You will need to create your hilo table, hilo comes in two flavours, a single number across all your tables or a number for any of your tables. I prefer the latter.
If NHibernate does insert rows automatically, does it properly take account of existing key values?
You will need to set the maxhi/maxlo manually, the lo is in the mappings and the hi is in the table, so you will need to change your XML mappings to :-
<id name="Id" column="Id" unsaved-value="0">
<generator class="hilo">
<param name="column">NextHi</param>
<param name="where">TableName='CmsLogin'</param>
<param name="max_lo">100</param>
</generator>
</id>
The following SQL can then be generated (by hand):-
CREATE TABLE hibernate_unique_key (
TableName varchar(25) NOT NULL,
NextHi bigint NOT NULL
)
then add a row into the database for every table you wish to use the hilo for: e.g.
CmsLogin,123
Address, 456
Note the 123 here would start my next insert id to (123 x 100) = 12300 therefore as long as 12300 is bigger than my current identity then all should be good!
and if you dont like the default table name hibernate_unique_key then you can throw this into the mix
<param name="table">HiloValues</param>
Related
I'm creating an application with Java Spring and Oracle DB.
In the app, I want to generate a primary key value that is unique as well as ordered and without gaps: 1,2,3,4,5 instead of 1,2,5,7,8,9.
I've at one point used max(id) + 1 to get the maximum value of the id, and the id of the next/current transaction. However I know it isn't perfect in the case of concurrency with multiple users or multiple sessions.
I've tried using sequences, but even with the ORDER tag it could still create gaps with the possibility of a failed transaction.
REATE SEQUENCE num_seq
START WITH 1
INCREMENT BY 1
ORDER NOCACHE NOCYCLE;
I need there to be gapless values as a requirement, however I'm unsure how it's possible in the case of multiple users/multiple sessions.
Don't do it.
The goal of primary keys is not to be displayed on the UI or to be exposed to the external world, but only to provide a unique identifier of the row.
In simple words, a primary key doesn't need to be sexy or good looking. It's an internal identifier.
If you are considering the idea of having serial identifier, that means you probably want to display it somewhere or you want to expose it to the external world. If that's the case, then create a secondary column (also unique) that serves this "public relations" goal. It can be automatically generated, or updated at leisure without affecting the integrity of the database.
It can also be generated by a secondary process that runs in a deferred way (e.g. every 10 minutes) that finds all the "unassigned" new rows, and gives them the new number. This has the advantage that is not vulnerable to concurrency.
I am new to SQL, I am coming from NoSQL.
I have seen that you need to make a unique id for you rows if you want to use unique ids. They are not automatically made by the database as it was in MongoDB. One way to do so is to create auto-incrementing ids.
Are PostgreSQL auto-incrementing id scalable? Does the DB have to insert a row at a time? How does it work?
-----EDIT-----
What I am actually wondering is in a distributed environment is there a risk that two rows may have the same id?
In Postgres, autoincrement is atomic and scalable. In case some inserts fail, some ids can be missing from sequence but inserted are guaranteed to be unique.
Also, all primary keys don't have to be generated. See my answer to your first question.
Autoincrementing columns that are defined as
id bigint PRIMARY KEY DEFAULT nextval('tab_id_seq')
or, more standard compliant, as
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY
use a sequence to generate unique values.
A sequence is a special database object that can very efficiently supply unique integers to concurrent database sessions. I doubt that any identity generator, be it in MongoDB or elsewhere, can be more efficient.
Since getting a new sequence values accesses shared state, you can optimize sequences for high concurrency by defining them with a CACHE value higher than 1. Then each database session that uses the sequence keeps a cache of unique values and doesn't have to access the shared state each time it needs a value.
I have a database used by several clients. I don't really want surrogate incremental key values to bleed between clients. I want the numbering to start from 1 and be client specific.
I'll use a two-part composite key of the tenant_id as well as an incremental id.
What is the best way to create an incremental key per tenant?
I am using SQL Server Azure. I'm concerned about locking tables, duplicate keys, etc. I'd typically set the primary key to IDENTITY and move on.
Thanks
Are you planning on using SQL Azure Federations in the future? If so, the current version of SQL Azure Federations does not support the use of IDENTITY as part of a clustered index. See this What alternatives exist to using guid as clustered index on tables in SQL Azure (Federations) for more details.
If you haven't looked at Federations yet, you might want to check it out as it provides an interesting way to both shard the database and for tenant isolation within the database.
Depending upon your end goal, using Federations you might be able to use a GUID as the primary clustered index on the table and also use an incremental INT IDENTITY field on the table. This INT IDENTITY field could be shown to end-users. If you are federating on the TenantID each "Tenant table" effectively becomes a silo (as I understand it at least) so the use of IDENTITY on a field within that table would effectively be an ever increasing auto generated value which increments within a given Tenant.
When \ if data is merged together (combining data from multiple Tenants) you would wind up with collisions on this INT IDENTITY field (hence why IDENTITY isn't supported as a primary key in federations) but as long as you aren't using this field as a unique identifier within the system at large you should be ok.
If you're looking to duplicate the convenience of having an automatically assigned unique INT key upon insert, you could add an INSTEAD OF INSERT trigger that uses MAX of the existing column +1 to determine the next value.
If the column with the identity value is the first key in an index, the MAX query will be a simple index seek, very efficient.
Transactions will ensure that unique values are assigned but this approach will have different locking semantics than the standard identity column. IIRC, SQL Server can allocate a different identity value for each transaction that requests it in parallel and if a transaction is rolled back, the value(s) allocated to it are discarded. The MAX approach would only allow one transaction to insert rows into the table at a time.
A related approach could be to have a dedicated key value table keyed by the table name, tenant ID and current identity value. It would require the same INSTEAD OF INSERT trigger and more boilerplate to query and keep that key table updated. It wouldn't improve parallel operations though; the lock would just be on a different table's record.
One possibility to fix the locking bottleneck would be to include the current SPID in the key's value (now the identity key is a combination of sequential int and whatever SPID happened to allocate it and not simply sequential), use the dedicated identity value table and insert records there per SPID as necessary; the identity table PK would be (table name, tenant, SPID) and have a non-key column with the current sequential value. That way, each SPID would have its own dynamically allocated identity pool and would only ever have its own SPID specific records locked.
Another downside is maintaining triggers that have to be updated whenever you change the columns in any of the special identity tables.
In our DB (on SQL Server 2005) we have a "Customers" table, whose primary key is Client Code, a surrogate, bigint IDENTITY(1,1) key; the table is referenced by a number of other tables in our DB thru a foreign key.
A new CR implementation we are estimating would require us to change ID column type to varchar, Client Code generation algorithm being shifted from a simple numeric progression to a strict 2-char representation, with codes ranging from 01 to 99, then progressing like this:
1A -> 2A -> ... -> 9A -> 1B -> ... 9Z
I'm fairly new to database design, but I smell some serious problems here. First of all, what about this client code generation algorithm? What if I need a Client Code to go beyond 9Z code limit?
The I have some question: would this change be feasible, the table being already filled with a fair amount of data, and referenced by multiple entities? If so, how would you approach this problem, and how would you implement Client Code generation?
I would leave the primary key as it is and would create another key (unique) on the client code generated.
I would do that anyway. It's always better to have a short number primary key instead of long char keys.
In some situation you might prefer a GUID (for replication purposes) but a number int/bigint is alway preferable.
You can read more here and here.
My biggest concern with what you are proposing is that you will be limited to 360 primary records. That seems like a small number.
Performing the change is a multi-step operation. You need to create the new field in the core table and all its related tables.
To do an in-place update, you need to generate the code in the core table. Then you need to update all the related tables to have the code based on the old id. Then you need to add the foreign key constraint to all the related tables. Then you need to remove the old key field from all the related tables.
We only did that in our development server. When we upgraded the live databases, we created a new database for each and copied the data over using a python script that queried the old database and inserted into the new database. I now update that script for every software upgrade so the core engine stays the same, but I can specify different tables or data modifications. I get the bonus of having a complete backup of the original database if something unexpected happens when upgrading production.
One strong argument in favor of a non-identity/guid code is that you want a human readable/memorable code and you need to be able to move records between two systems.
Performance is not necessarily a concern in SQL Server 2005 and 2008. We recently went through a change where we moved from int ids everywhere to 7 or 8 character "friendly" record codes. We expected to see some kind of performance hit, but we in fact saw a performance improvement.
We also found that we needed a way to quickly generate a code. Our codes have two parts, a 3 character alpha prefix and a 4 or 5 digit suffix. Once we had a large number of codes (15000-20000) we were finding it to slow to parse the code into prefix and suffix and find the lowest unused code (it took several seconds). Because of this, we also store the prefix and the suffix separately (in the primary key table) so that we can quickly find the next available lowest code with a particular prefix. The cached prefix and suffix made the search almost fee.
We allow changing of the codes and they changed values propagate by cascade update rules on the foreign key relationship. We keep an identity key on the core code table to simplify the update of the code.
We don't use an ORM, so I don't know what specific things to be aware of with that. We also have on the order of 60,000 primary keys in our biggest instance, but have hundreds of tables related and tables with millions of related values to the code table.
One big advantage that we got was, in many cases, we did not need to do a join to perform operations. Everywhere in the software the user references things by friendly code. We don't have to do a lookup of the int ID (or a join) to perform certain operations.
The new code generation algorithm isn't worth thinking about. You can write a program to generate all possible codes in just a few lines of code. Put them in a table, and you're practically done. You just need to write a function to return the smallest one not yet used. Here's a Ruby program that will give you all the possible codes.
# test.rb -- generate a peculiar sequence of two-character codes.
i = 1
('A'..'Z').each do |c|
(1..9).each do |n|
printf("'%d%s', %d\n", n, c, i)
i += 1
end
end
The program will create a CSV file that you should be able to import easily into a table. You need two columns to control the sort order. The new values don't naturally sort the way your requirements specify.
I'd be more concerned about the range than the algorithm. If you're right about the requirement, you're limited to 234 client codes. If you're wrong, and the range extends from "1A" to "ZZ", you're limited to less than a thousand.
To implement this requirement in an existing table, you need to follow a careful procedure. I'd try it several times in a test environment before trying it on a production table. (This is just a sketch. There are a lot of details.)
Create and populate a two-column table to map
existing bigints to the new CHAR(2).
Create new CHAR(2) columns in all the
tables that need them.
Update all the new CHAR(2) columns.
Create new NOT NULL UNIQUE or PRIMARY KEY constraints and new FOREIGN KEY constraints on the new CHAR(2) columns.
Rewrite user interface code (?) to target the new columns. (Might not be necessary if you rename the new CHAR(2) and old BIGINT columns.)
Set a target date to drop the old BIGINT columns and constraints.
And so on.
Not really addressing whether this is a good idea or not, but you can change your foreign keys to cascade the updates. What will happen once you're done doing that is that when you update the primary key in the parent table, the corresponding key in the child table will be updated accordingly.
I have started using the s#arp architecture which uses FNhibernate and GeneratedBy.HiLo to generate primary keys (there is also table hibernate_unique_key). Apparently, this is recommended practise and I would like to stick with this. Now to my problem. I have used NHibernate and hbm mapping quite a bit and usually used identity columns for my primary keys. This allowed me to seed the database using SQL. Can I do this with the aforementioned setup (hibernate_unique_key table etc.). I need to do this as SQL insert is much more efficient than using NHibernate + C# to seed the db with a million entities. Any feedback would be very much appreciated. Thanks.
Christian
Maybe it's a bit late but the Identity generator will break the UnitOfWork-pattern.
If you perform a Save on your currentSession it will already try to insert the entity in the DB and thus break the whole meaning of the UoW.
After many hours I found the reason why it was broken and the reason was of this Identity Generator. I use now the HiLo generator.
Following links helped me through this:
Nice article about the behaviour of these generators
You should be able to seed the database using plain SQL and still use HiLo to generate the primary keys in NHibernate. What you have to do is to set the NextHi value(s) in the HiLo table to values that are high enough that the next entity you save will get an id that is higher than the highest id set when you seed the database.
So, you should be able to do something like this:
run the schema export
seed the database using a custom sql script (you would have to supply your own id's in the script, since they are not generated by the database)
manually insert a big enough value into the hibernate_unique_key table, so that the next id generated by NHibernate is larger than the largest inserted in the seeding
use NHibernate as usual
There are a few different approaches to using HiLo with NHibernate (one shared next-hi for all entities, a next hi per entity, etc.) so you might have to do a little experimenting to find out what value(s) would be appropriate to write to the hibernate_unique_key table after the seeding, depending on your hilo strategy and what max_lo you are using etc.
As a side note, schema export does not seem to support multiple rows in the hibernate_unique_key table that well, so you might have to do some manual stuff to create all the rows in the table if you use a hilo row per entity.
You could also use Identity to generate the ids, but at the cost of worse performance with NHibernate. The reason for the performance loss is that NHibernate has to do an extra read for each insert to get the id that was generated by the database. With hilo NHibernate already knows the id that the entity will get, so there is no need for that extra read.
Another option could be to use GuidComb, which also allows NHibernate to generate the ids, and therefore removes the need to query the database to get the id after an insert. However, you then have to look at ugly guids instead of nice integers when developing. :)
I guess the problem is that the pk generation is controlled by nhibernate and not the db. so an option would be to use instance.GeneratedBy.Identity(). do you reckon that would be sensible?
I would really appreciate any comments.
Christian