Fabricate entities with multiple IDENTITY PKs? - sql

Disclosure: I'm a 'natural key' advocate myself and averse to the IDENTITY PK approach. But I do have a 'live and let live' approach to lifestyle choices, so no religious arguments here please :)
I have inherited a table where the only key is the IDENTITY PK column; let's call it ID. There are a many tables that reference ID. The intended process of creating a new entity seems to be:
INSERT INTO the table.
Use scope_identity to grab the
auto-generated ID.
Use the auto-generated ID to INSERT
into related tables.
In fact, there is a helper stored proc to create an entity and return the ID. However, I have a couple of issues:
I need to go further than the helper stored proc and create rows in related tables which themselves have IDENTITY PKs, so for each entity I need to grab several auto-generated values along the way.
I need to fabricate several hundred entities and the helper procs are coded to handle one entity at a time.
What is the best way to bulk fabricate entities using the 'IDENTITY PK' design?
When using my own 'natural key' designs, I can generate the key values in advance, therefore it's simply a case of loading some scratch tables and INSERTing into the tables in the order expected by the foreign keys. Therefore, I'm tempted to find a sequence of high value INTEGER values (to match the type of the IDENTIY columns) which I know isn't being used now and hope that they won't be being used when the time comes to do the INSERT. Is this a good idea?

Are you talking specifically about MS SQL Server?
It is unfortunate that IDENTITY columns disallow explicit inserts by default. In other DBMSs, being auto-increment wouldn't stop you from inserting an explicit value into that column, which would make it easy to choose the keys in advance. Unfortunately on SQL Server you have the inconvenience of SET IDENTITY_INSERT to worry about.
there is a helper stored proc to create an entity and return the ID.
It seems a little over-the-top to me to use an sproc for that, since it's generally as simple as selecting the SCOPE_IDENTITY(). Quite often you can avoid the explicit select by writing each insert such that it can use the last insert's SCOPE_IDENTITY() directly.
find a sequence of high value INTEGER values which I know isn't being used now and hope that they won't be being used [...] Is this a good idea?
They don't necessarily have to be very high values; in fact if you did that often you'd be making many huge gaps in the IDENTITY values, which is generally better avoided. You could even use the MAX(column)+1 values as long as you either caught the error where someone else used those values in between times, or, better, do a select-max then insert in a transaction.

Related

Adding a row to Table A if it has a required foreign key to Table B which has a required foreign key to Table A

This might sound complicated, so I'll give an example.
Say, I have two tables Instructor and Class.
Instructor has a required field called PreferredClassID which has a foreign key against Class.
Class has a required field called CurrentInstructorID which is a foreign key against Instructor
Is it possible to insert a row to either of these tables?
Cause if I insert a row to Instructor, I won't be able to as I'll need to supply a PreferredClassID, but I can't create a Class row either because it needs a CurrentInstructorID.
If I can't do this, how would I solve this problem? Would I just need to make one of those fields non-required (even if business requirements specifies it really should be required?)
If you find yourself here, reevaluate your data relation model.
In this case, you could simply have a lookup table called PreferredCourse with courseId and instructorId.
This will enforce that both the course and instructor exist before adding the row to the PreferredCourse lookup. Maintaining business model requirements without bending the rules of database model requirements.
While it may seem excessive to have another table, it will prevent a whole lot of maintenance overhead in both your database procedures and jobs, and your application code. Circular references create nothing but headaches and are easily solved with small lookup tables and JOINs.
The Impaler gave an example of how to accomplish this with your current data structure. Please note, that you have to 1: make a key nullable in at least one of the tables, and then 2: Perform INSERTs in a specified order. Or, 3: disable the constraints, 4: perform INSERTS, 5: reenable constraints, 6: roll back transaction if constraints are now broken.
There is a whole lot that can go wrong, simply fix the relation model now before things get out of hand.
As long as one of those foreign keys allows a null value, you're good. So you:
Insert the row that accepts the null value first (say Instructor), with a null value on the FK. Get the ID of the inserted row.
Insert in the other table (say Class). In the FK you use the ID you got from step #1. Once inserted, you get the ID of this new row.
Update the FK on the first row (Instructor) with the ID you got from step #2.
Commit.
Alternatively, if both FKs are NOT NULL then you have a bit of a problem. The options I see for this last case are:
Use deferrable FK integrity check. Some databases do allow you to insert without checking integrity until the COMMIT happens. This is really tricky, and enabling this is looking for trouble.
Disable the FK for a short period of time. Some databases allow you to enable/disable constraints. You are not deleting them, just temporarily disabling them. If you do this, don't forget to enable them back.
Drop the constraint temporarily, while you do the insert, and the add it again. This is really a work around of last resort. Adding/Dropping constraint are DML statements and usually cannot participate in a transaction. Do this at your own peril.
Something to consider (as per user7396598's answer) is looking at how normal forms apply to your data as it fits within your relational model.
In this case, it might be worth looking at the following:
With your Instructor table, is the PreferredClassID a necessary component? Does an instructor -need- to have a preferred class, or is it okay to say "Hey, I'm creating an entry for a new instructor, I don't know their preferred class."
(if they're new, they might not have a preferred class that your school offers)
This is a case where you definitely want to have a foreign key, but it should be okay to say 'I don't necessarily know the value I want to put there.'
In a similar vein, does a Class need to have an instructor when it's created? Is it possible to create a Class that an instructor has not been assigned to yet?
Again, both of these points are really a case of 'I don't know what I want to put here, but when I do, it should be a specific instance that exists in another table.'

SQL database design: storing the type of a row

I am designing a database to contain a table reference, with a column type that is one of several predefined values (e.g., book, movie, magazine, etc.). I intend the range of possible values to expand over time (e.g. if I realize that I missed the academic_paper type, I want to be able to put that in).
The easiest solution would seem to be to simply store a string representing the type into the table. But this sounds like it would result in a lot of wasted space.
The other solution I thought of is creating a new table reference_types, which the type column references in its foreign key. This seems to have the added benefit of ensuring valid foreign keys (so that I won't accidentally mistype a "magzine" somewhere in my code), possible allow for faster queries for all media of a certain type (since integer comparisons should be much faster than string comparisons), but also slow my application down a bit as joins would be required whenever I need the reference type, and probably complicate logic because of those extra joins.
What are your thoughts on schema design for this problem?
Your second solution is the correct one. Create a secondary table to store your reference types and link them using a foreign key.
For further reading on this subject the search term you'd want to use is 'database normalisation'.
Create the reference_types table. And in your references table use integer and also add a reference_type_name field.
You can query the references table to get the integer key and print its name when needed without performing a join to the other table, and still use that table to perfom other operations, just keep both tables with equal type names.
I know it sonds redundant, but it's really the fastest way to do a simple query by int key and have it all together.
It depends, if you will want to add some other information to reference types, then use the second approach. If not, use the first one because it's faster and the information stored is only a string (you can always select unique to retrieve your types). Read this article for more info.

SQL: Advantages of an ENUM vs. a one-to-many relationship?

I very rarely see ENUM datatypes used in the wild; a developer almost always just uses a secondary table that looks like this:
CREATE TABLE officer_ranks (
id int PRIMARY KEY
,title varchar NOT NULL UNIQUE);
INSERT INTO officer_ranks VALUES (1,'2LT'),(2,'1LT'),(3,'CPT'),(4,'MAJ'),(5,'LTC'),(6,'COL'),(7,'BG'),(8,'MG'),(9,'LTG'),(10,'GEN');
CREATE TABLE officers (
solider_name varchar NOT NULL
,rank int NOT NULL REFERENCES officer_ranks(id) ON DELETE RESTRICT
,serial_num varchar PRIMARY KEY);
But the same thing can also be shown using a user-defined type / ENUM:
CREATE TYPE officer_rank AS ENUM ('2LT', '1LT','CPT','MAJ','LTC','COL','BG','MG','LTG','GEN');
CREATE TABLE officers (
solider_name varchar NOT NULL
,rank officer_rank NOT NULL
,serial_num varchar PRIMARY KEY);
(Example shown using PostgreSQL, but other RDBMS's have similar syntax)
The biggest disadvantage I see to using an ENUM is that it's more difficult to update from within an application. And it might also confuse an inexperienced developer who's used to using a SQL DB simply as a bit bucket.
Assuming that the information is mostly static (weekday names, month names, US Army ranks, etc) is there any advantage to using a ENUM?
Example shown using PostgreSQL, but other RDBMS's have similar syntax
That's incorrect. It is not an ISO/IEC/ANSI SQL requirement, so the commercial databases do not provide it (you are supposed to provide Lookup tables). The small end of town implement various "extras", but do not implement the stricter requirements, or the grunt, of the big end of town.
We do not have ENUMs as part of a DataType either, that is absurd.
The first disadvantage of ENUMs is that is it non-standard and therefore not portable.
The second big disadvantage of ENUMs is, that the database is Closed. The hundreds of Report Tools that can be used on a database (independent of the app), cannot find them, and therefore cannot project the names/meanings. If you had a normal Standard SQL Lookup table, that problem is eliminated.
The third is, when you change the values, you have to change DDL. In a Normal Standard SQL database, you simply Insert/Update/Delete a row in the Lookup table.
Last, you cannot easily get a list of the content of the ENUM; you can with a Lookup table. More important, you have a vector to perform any Dimension-Fact queries with, eliminating the need for selecting from the large Fact table and GROUP BY.
I don't see any advantage in using ENUMS.
They are harder to maintain and don't offer anything that a regular lookup table with proper foreign keys wouldn't allow you to do.
A disadvantage of using something like an ENUM is that you can't get a list of all the available values if they don't happen to exist in your data table, unless you hard-code the list of available values somewhere. For example, if in your OFFICERS table you don't happen to have an MG on post there's no way to know the rank exists. Thus, when BG Blowhard is relieved by MG Marjorie-Banks you'll have no way to enter the new officer's rank - which is a shame, as he is the very model of a modern Major General. :-) And what happens when a General of the Army (five-star general) shows up?
For simple types which will not change I've used domains successfully. For example, in one of my databases I've got a yes_no_domain defined as follows:
CREATE DOMAIN yes_no_dom
AS character(1)
DEFAULT 'N'::bpchar
NOT NULL
CONSTRAINT yes_no_dom_check
CHECK ((VALUE = ANY (ARRAY['Y'::bpchar, 'N'::bpchar])));
Share and enjoy.
ENUMS are very-very-very useful! You just have to know how to use them:
An ENUM uses only 2 Bytes of storage.
No need for additional constraint (as replacement for FK).
Cheaper changes of Values compared to natural values in FKs.
No need for additional JOIN
ENUMs are ordered, ex you can compare if Monday < Friday, or January is < June or Project Initiation is < Payroll.
Thus if you have a fixed list of string values, which you want to use, an ENUM is a better solution compared to a lookup table. Let's say you need to List Amino-Acids in your products, with their respective weight. Today there are ~20 Amino Acids. If you would store their full names, you'd need much more space each time then 2 Bytes. The other option is to use artificial keys and to link to a foreign table. But how would the foreign Table look like? Would it have 2 columns: ID and Amino Acid Name? And you would join that table every time? What if your main table has >40 such fields? Querying that table would involve >40 Joins.
If your database hosts 1600 Tables, 400 of which are lookup tables which just replace ENUMs, your devs will waste lots of time navigating through them (in addition to the JOINs). Yes, you can work with prefixes, schemas and such.... but why not just kick those tables out?
ENUMS are Enumerated lists / ordered. That means that if you have values which are ordered, you are actually saving the hassle of maintaining a 3 columns lookup table.
The question is rather: why do I need lookup tables then?
Well, the answer is easy:
When your values are changing often
When you need to store more additional attributes --> The lookup table corresponds to a full fledged data object, and not a lookup list.
When you need it quick and dirty
And now the funny thing:
Lookup Tables and ENUMS are not complete replacements for each other!!!!
If you have a list, where the PK is single-column natural key. The list can grow or the values can change their names (for some reason), then you could define an ENUM and use it for both: PK in lookup and FK in main tables!
Example benefit:
you have to change the name of a lookup key. Without using the ENUM the DBMS will have to cascade the changes to all tables, where you use this value and not just your lookup table. If you are using ENUM, then you just change the value of ENUM, and there are no changes to the data.
A small advantage may lie in the fact, that you have a sort of UDT when creating an ENUM. A user defined type can be reused formally in many other database objects, e.g. in views, other tables, other types, stored procedures (in other RDBMS), etc.
Another advantage is for documentation of the allowed values of a field. Examples:
A yes/no field
A male/female field
A mr/mrs/ms/dr field
Probably a matter of taste. I prefer ENUMs for these kinds of fields, rather than foreign keys to lookup tables for such simple concepts.
Yet another advantage may be that when you use code generation or ORMs like jOOQ in Java, you can use that ENUM to generate a Java enum class from it, instead of joining the lookup table, or working with the ENUM literal's ID
It's a fact, though, that only few RDBMS support a formal ENUM type. I only know of Postgres and MySQL. Oracle or DB2 don't have it.
Advantages:
Type safety for stored procedures: will raise a type error if argument can not be coerced into the type. Like: select court_martial('3LT') would raise a type error automatically.
Custom coalition order: In your example, officers could be sorted without a ranking id.
Generally speaking, enum is better for things that don't change much, and it uses slightly fewer resources, since there's no FK checks or anything like to execute on insert etc.
Using a lookup table is more elegant and or traditional and it's much easier to add and remove options than an enum. It's also easier to mass change the values than an enum.
Well, you don't see, because usually developers are using enums in programming languages such as Java, and the don't have their counterparts in database design.
In database such enums are usually text or integer fields, with no constraints. Database enums will not be translated into Java/C#/etc. enums, so the developers see no gain in this.
There are very many very good database features which are rarely used because most ORM tools are too primitive to support them.
Another benefit of enums over a lookup table is that when you write SQL functions you get type checking.

FNhibernate, GeneratedBy.HiLo, hibernate_unique_key etc

I have started using the s#arp architecture which uses FNhibernate and GeneratedBy.HiLo to generate primary keys (there is also table hibernate_unique_key). Apparently, this is recommended practise and I would like to stick with this. Now to my problem. I have used NHibernate and hbm mapping quite a bit and usually used identity columns for my primary keys. This allowed me to seed the database using SQL. Can I do this with the aforementioned setup (hibernate_unique_key table etc.). I need to do this as SQL insert is much more efficient than using NHibernate + C# to seed the db with a million entities. Any feedback would be very much appreciated. Thanks.
Christian
Maybe it's a bit late but the Identity generator will break the UnitOfWork-pattern.
If you perform a Save on your currentSession it will already try to insert the entity in the DB and thus break the whole meaning of the UoW.
After many hours I found the reason why it was broken and the reason was of this Identity Generator. I use now the HiLo generator.
Following links helped me through this:
Nice article about the behaviour of these generators
You should be able to seed the database using plain SQL and still use HiLo to generate the primary keys in NHibernate. What you have to do is to set the NextHi value(s) in the HiLo table to values that are high enough that the next entity you save will get an id that is higher than the highest id set when you seed the database.
So, you should be able to do something like this:
run the schema export
seed the database using a custom sql script (you would have to supply your own id's in the script, since they are not generated by the database)
manually insert a big enough value into the hibernate_unique_key table, so that the next id generated by NHibernate is larger than the largest inserted in the seeding
use NHibernate as usual
There are a few different approaches to using HiLo with NHibernate (one shared next-hi for all entities, a next hi per entity, etc.) so you might have to do a little experimenting to find out what value(s) would be appropriate to write to the hibernate_unique_key table after the seeding, depending on your hilo strategy and what max_lo you are using etc.
As a side note, schema export does not seem to support multiple rows in the hibernate_unique_key table that well, so you might have to do some manual stuff to create all the rows in the table if you use a hilo row per entity.
You could also use Identity to generate the ids, but at the cost of worse performance with NHibernate. The reason for the performance loss is that NHibernate has to do an extra read for each insert to get the id that was generated by the database. With hilo NHibernate already knows the id that the entity will get, so there is no need for that extra read.
Another option could be to use GuidComb, which also allows NHibernate to generate the ids, and therefore removes the need to query the database to get the id after an insert. However, you then have to look at ugly guids instead of nice integers when developing. :)
I guess the problem is that the pk generation is controlled by nhibernate and not the db. so an option would be to use instance.GeneratedBy.Identity(). do you reckon that would be sensible?
I would really appreciate any comments.
Christian

How do I enforce data integrity rules in my database?

I'm designing this collection of classes and abstract (MustInherit) classes…
This is the database table where I'm going to store all this…
As far as the Microsoft SQL Server database knows, those are all nullable ("Allow Nulls") columns.
But really, that depends on the class stored there: LinkNode, HtmlPageNode, or CodePageNode.
Rules might look like this...
How do I enforce such data integrity rules within my database?
UPDATE: Regarding this single-table design...
I'm still trying to zero in on a final architecture.
I initially started with many small tables with almost zero nullalbe fields.
Which is the best database schema for my navigation?
And I learned about the LINQ to SQL IsDiscriminator property.
What’s the best way to handle one-to-one relationships in SQL?
But then I learned that LINQ to SQL only supports single table inheritance.
Can a LINQ to SQL IsDiscriminator column NOT inherit?
Now I'm trying to handle it with a collection of classes and abstract classes.
Please help me with my .NET abstract classes.
Use CHECK constraints on the table. These allow you to use any kind of boolean logic (including on other values in the table) to allow/reject the data.
From the Books Online site:
You can create a CHECK constraint with
any logical (Boolean) expression that
returns TRUE or FALSE based on the
logical operators. For the previous
example, the logical expression is:
salary >= 15000 AND salary <= 100000.
It looks like you are attempting the Single Table Inheritance pattern, this is a pattern covered by the Object-Relational Structural Patterns section of the book Patterns of Enterprise Application Architecture.
I would recommend the Class Table Inheritance or Concrete Table Inheritance patterns if you wish to enforce data integrity via SQL table constraints.
Though it wouldn't be my first suggestion, you could still use Single Table Inheritance and just enforce the constraints via a Stored Procedure.
You can set up some insert/update triggers. Just check if these fields are null or notnull, and reject insert/update operation if needed. This is a good solution if you want to store all the data in the same table.
You can create also create a unique table for each classes as well.
Have a unique table for each type of node.
Why not just make the class you're building enforce the data integrity for its own type?
EDIT
In that case, you can either a) use logical constraints (see below) or b) stored procedures to do inserts/edits (a good idea regardless) or c) again, just make the class enforce data integrity.
A mixture of C & B would be the course of events I take. I would have unique stored procedures for add/edits for each node type (i.e. Insert_Update_NodeType) as well as make the class perform data validation before saving data.
Personally I always insist on putting data integrity code on the table itself either via a trigger or a check constraint. The reason why is that you cannot guarantee that only the user interface will update insert or delete records. Nor can you guarantee that someone might not write a second sp to get around the constraints in the orginal sp without understanding the actual data integrity rules or even write it because he or she is unaware of the existence of the sp with the rules. Tables are often affected by DTS or SSIS packages, dynamic queries from the user interface or through Query analyzer or the query window, or even by scheduled jobs that run code. If you do not put the data integrity code at the table level, sooner or later your data will not have integrity.
It's probably not the answer you want to hear, but the best way to avoid logical inconsistencies, you really want to look at database normalisation
Stephen's answer is the best. But if you MUST, you could add a check constraint the HtmlOrCode column and the other columns which need to change.
I am not that familiar with SQL Server, but I know with Oracle you can specify Constraints that you could use to do what you are looking for. I am pretty sure you can define constraints in SQL server also though.
EDIT: I found this link that seems to have a lot information, kind of long but may be worth a read.
Enforcing Data Integrity in Databases
Basically, there are four primary types of data integrity: entity, domain, referential and user-defined.
Entity integrity applies at the row level; domain integrity applies at the column level, and referential integrity applies at the table level.
Entity Integrity ensures a table does not have any duplicate rows and is uniquely identified.
Domain Integrity requires that a set of data values fall within a specific range (domain) in order to be valid. In other words, domain integrity defines the permissible entries for a given column by restricting the data type, format, or range of possible values.
Referential Integrity is concerned with keeping the relationships between tables synchronized.
#Zack: You can also check out this blog to read more details about data integrity enforcement, here- https://www.bugraptors.com/what-is-data-integrity/
SQL Server doesn't know anything about your classes. I think that you'll have to enforce this by using a Factory class that constructs/deconstructs all these for you and makes sure that you're passing the right values depending upon the type.
Technically this is not "enforcing the rules in the database" but I don't think that this can be done in a single table. Fields either accept nulls or they don't.
Another idea could be to explore SQL Functions and Stored Procedures that do the same thing. BUt you cannot enforce a field to be NOT NULL for one record and NULL for the next one. That's your Business Layer / Factory job.
Have you tried NHibernate? It's much more matured product than Entity Framework. It's free.