Given a project I'm working on, we have an old database structure we're migrating data from into a new database structure, and we need to preserve the old keys for a few tables for backwards compatibility with some existing application functionality.
Currently, there are two approaches we are considering for addressing this need:
Create an extra nullable field for each table and insert the old key into that new field
Create companion table(s) that contain the old and new key mappings
Note: new data will not generate old ID keys, so in approach #1, eventually the nullable field will contain nulls over time for new records.
Which approach is better for a cleaner database design, and data management long-term?
Do you see any issues with either approach, and if so, what issues?
Is there a #3 approach that I haven't thought of yet?
You mention sql, but is it SQL-Server?
if SQL-Server, look into SET INSERT_IDENTITY. This allows you to explicitly insert values for the auto-increment columns vs being in a protected mode for that column.
However, I believe that if you explicitly include the PK in the insert statement with its value, it will respect that and save the original key in the original column you are hoping to retain without having to force yet another column for backward compatibility purposes.
I would like to insert some data using plain sql into some tables that use hilo id generation in conjunction with nhibernate. Is this possible? I have found some similar questions but no definite answer yet. Thanks!
Chris
Sure you can do it. Just update the hi value in appropriate table and generate Id for inserts. Nhibernate won't validate ID's in DB.
unique key table is used only for inserts, and once object is in DB it doesn't matter anymore where its id came from
Disclosure: I'm a 'natural key' advocate myself and averse to the IDENTITY PK approach. But I do have a 'live and let live' approach to lifestyle choices, so no religious arguments here please :)
I have inherited a table where the only key is the IDENTITY PK column; let's call it ID. There are a many tables that reference ID. The intended process of creating a new entity seems to be:
INSERT INTO the table.
Use scope_identity to grab the
auto-generated ID.
Use the auto-generated ID to INSERT
into related tables.
In fact, there is a helper stored proc to create an entity and return the ID. However, I have a couple of issues:
I need to go further than the helper stored proc and create rows in related tables which themselves have IDENTITY PKs, so for each entity I need to grab several auto-generated values along the way.
I need to fabricate several hundred entities and the helper procs are coded to handle one entity at a time.
What is the best way to bulk fabricate entities using the 'IDENTITY PK' design?
When using my own 'natural key' designs, I can generate the key values in advance, therefore it's simply a case of loading some scratch tables and INSERTing into the tables in the order expected by the foreign keys. Therefore, I'm tempted to find a sequence of high value INTEGER values (to match the type of the IDENTIY columns) which I know isn't being used now and hope that they won't be being used when the time comes to do the INSERT. Is this a good idea?
Are you talking specifically about MS SQL Server?
It is unfortunate that IDENTITY columns disallow explicit inserts by default. In other DBMSs, being auto-increment wouldn't stop you from inserting an explicit value into that column, which would make it easy to choose the keys in advance. Unfortunately on SQL Server you have the inconvenience of SET IDENTITY_INSERT to worry about.
there is a helper stored proc to create an entity and return the ID.
It seems a little over-the-top to me to use an sproc for that, since it's generally as simple as selecting the SCOPE_IDENTITY(). Quite often you can avoid the explicit select by writing each insert such that it can use the last insert's SCOPE_IDENTITY() directly.
find a sequence of high value INTEGER values which I know isn't being used now and hope that they won't be being used [...] Is this a good idea?
They don't necessarily have to be very high values; in fact if you did that often you'd be making many huge gaps in the IDENTITY values, which is generally better avoided. You could even use the MAX(column)+1 values as long as you either caught the error where someone else used those values in between times, or, better, do a select-max then insert in a transaction.
Is it possible in hibernate to have an entity where some IDs are assigned and some are generated?
For instance:
Some objects have an ID between 1-10000 that are generated outside of the database; while some entities come in with no ID and need an ID generated by the database.
You could use 'assigned' as the Id generation strategy, but you would have to give the entity its id before you saved it to the database. Alternately you could build your own implementation of org.hibernate.id.IdentifierGenerator to provide the Id in the manner you've suggested.
I have to agree w/ Cade Roux though, and doing so seems like it be much more difficult than using built in increment, uuid, or other form of id generation.
I would avoid this and simply have an auxiliary column for the information about the source of the object and a column for the external identifier (assuming the external identifier was an important value you wanted to keep track of).
It's generally a bad idea to use columns for mixed purposes - in this case to infer from the nature of a surrogate key the source of an object.
Use any generator you like, make sure it can start at an offset (when you use a sequence, you can initialize it accordingly).
For all other entities, call setId() before you insert them. Hibernate will only generate an id if the id property is 0. Note that you should first insert objects with ids into the db and then work with them. There is a lot of code in Hibernate which expects the object to be in the DB when id != 0.
Another solution is to use negative ids for entities which come with an id. This will also make sure that there are no collisions when you insert an new object.
I'm designing this collection of classes and abstract (MustInherit) classes…
This is the database table where I'm going to store all this…
As far as the Microsoft SQL Server database knows, those are all nullable ("Allow Nulls") columns.
But really, that depends on the class stored there: LinkNode, HtmlPageNode, or CodePageNode.
Rules might look like this...
How do I enforce such data integrity rules within my database?
UPDATE: Regarding this single-table design...
I'm still trying to zero in on a final architecture.
I initially started with many small tables with almost zero nullalbe fields.
Which is the best database schema for my navigation?
And I learned about the LINQ to SQL IsDiscriminator property.
What’s the best way to handle one-to-one relationships in SQL?
But then I learned that LINQ to SQL only supports single table inheritance.
Can a LINQ to SQL IsDiscriminator column NOT inherit?
Now I'm trying to handle it with a collection of classes and abstract classes.
Please help me with my .NET abstract classes.
Use CHECK constraints on the table. These allow you to use any kind of boolean logic (including on other values in the table) to allow/reject the data.
From the Books Online site:
You can create a CHECK constraint with
any logical (Boolean) expression that
returns TRUE or FALSE based on the
logical operators. For the previous
example, the logical expression is:
salary >= 15000 AND salary <= 100000.
It looks like you are attempting the Single Table Inheritance pattern, this is a pattern covered by the Object-Relational Structural Patterns section of the book Patterns of Enterprise Application Architecture.
I would recommend the Class Table Inheritance or Concrete Table Inheritance patterns if you wish to enforce data integrity via SQL table constraints.
Though it wouldn't be my first suggestion, you could still use Single Table Inheritance and just enforce the constraints via a Stored Procedure.
You can set up some insert/update triggers. Just check if these fields are null or notnull, and reject insert/update operation if needed. This is a good solution if you want to store all the data in the same table.
You can create also create a unique table for each classes as well.
Have a unique table for each type of node.
Why not just make the class you're building enforce the data integrity for its own type?
EDIT
In that case, you can either a) use logical constraints (see below) or b) stored procedures to do inserts/edits (a good idea regardless) or c) again, just make the class enforce data integrity.
A mixture of C & B would be the course of events I take. I would have unique stored procedures for add/edits for each node type (i.e. Insert_Update_NodeType) as well as make the class perform data validation before saving data.
Personally I always insist on putting data integrity code on the table itself either via a trigger or a check constraint. The reason why is that you cannot guarantee that only the user interface will update insert or delete records. Nor can you guarantee that someone might not write a second sp to get around the constraints in the orginal sp without understanding the actual data integrity rules or even write it because he or she is unaware of the existence of the sp with the rules. Tables are often affected by DTS or SSIS packages, dynamic queries from the user interface or through Query analyzer or the query window, or even by scheduled jobs that run code. If you do not put the data integrity code at the table level, sooner or later your data will not have integrity.
It's probably not the answer you want to hear, but the best way to avoid logical inconsistencies, you really want to look at database normalisation
Stephen's answer is the best. But if you MUST, you could add a check constraint the HtmlOrCode column and the other columns which need to change.
I am not that familiar with SQL Server, but I know with Oracle you can specify Constraints that you could use to do what you are looking for. I am pretty sure you can define constraints in SQL server also though.
EDIT: I found this link that seems to have a lot information, kind of long but may be worth a read.
Enforcing Data Integrity in Databases
Basically, there are four primary types of data integrity: entity, domain, referential and user-defined.
Entity integrity applies at the row level; domain integrity applies at the column level, and referential integrity applies at the table level.
Entity Integrity ensures a table does not have any duplicate rows and is uniquely identified.
Domain Integrity requires that a set of data values fall within a specific range (domain) in order to be valid. In other words, domain integrity defines the permissible entries for a given column by restricting the data type, format, or range of possible values.
Referential Integrity is concerned with keeping the relationships between tables synchronized.
#Zack: You can also check out this blog to read more details about data integrity enforcement, here- https://www.bugraptors.com/what-is-data-integrity/
SQL Server doesn't know anything about your classes. I think that you'll have to enforce this by using a Factory class that constructs/deconstructs all these for you and makes sure that you're passing the right values depending upon the type.
Technically this is not "enforcing the rules in the database" but I don't think that this can be done in a single table. Fields either accept nulls or they don't.
Another idea could be to explore SQL Functions and Stored Procedures that do the same thing. BUt you cannot enforce a field to be NOT NULL for one record and NULL for the next one. That's your Business Layer / Factory job.
Have you tried NHibernate? It's much more matured product than Entity Framework. It's free.