Best database design for multiple entity types - sql

I'm working on a web app and I have to design it's database. There's a part that didn't come very straightforward to me, so after some thinking and research I came with multiple ideas. Still neither seems completely suitable, so I'm not sure which one to implement and why.
The simplified problem looks as follows:
I have a table Teacher. There are 2 types of teachers, according to the relations with their Fields and Subjects:
A Teacher that's related to a Field, the Field is obligatory related to a Category
A Teacher that's not related to a Field, but directly to a Category
My initial idea was to have two nullable foreign keys, one to the table Field, and the other to the table Category. But in this case, how can I make sure that exactly one is null, and the other one is not?
The other idea is to create a hierarchy, with two types of Teacher tables derived from the table Teacher (is-a relation), but I couldn't find any useful tutorial on this.
I'm developing the app using Django with SQLite db

OK, your comment made it much clearer:
If a Teacher belongs to exactly one category, you should keep this in the Teacher's table directly:
Secondly each teacher belongs to "one or zero" fields. If this is sure for ever you should use a nullable FieldID column. This is set or remains empty.
Category (CategoryID, Name, ...)
Field (FieldID,Name,...)
Teacher (TeacherID,FieldID [NULL FK],CategoryID [NOT NULL FK], FirstName, Lastname, ...)
Remark: This is almost the same as my mapping table of the last answer. The only difference is, that you'll have a strict limitation with your "exactly one" or "exactly none or one"... From my experience I'd still prefer the open approach. It is easy to enforce your rules with unique indexes including the TeacherID-column. Sooner or later you'll probably have to re-structure this...
As you continue, one category is related to "zero or more" fields. There are two approaches:
Add a CategoryID-column to the Field-table (NOT NULL FK). This way you define a field several times with differing CategoryIDs (combined unique index!). A category's fields list you'll get simply by asking the Field-table for all fields with the given CategoryID.
Better in my eyes was a mapping table CategoryField. If you enforce a unique FieldID you'll get for sure, that no field is mapped twice. And add a unique index on the combination of CategoryID and FieldID...
A SELECT could be something like this (SQL Server Syntax, untested):
SELECT Teacher.TeacherID
,Teacher.FieldID --might be NULL
,Teacher.CategoryID --never NULL
,Teacher.[... Other columns ...]
,Field.Name --might be NULL
--The following columns you pick from the right source,
--depending on the return value of the LEFT JOIN to Field and the related "catField"
--the directly joined "Category" (which is never NULL) is the "default"
,ISNULL(catField.CategoryID,Category.CategoryID) AS ResolvedCategoryID
,ISNULL(catField.Name,Category.Name) AS ResolvedCategoryName
,[... Other columns ...]
FROM Teacher
INNER JOIN Category ON Teacher.CategoryID=Category.CategoryID --never NULL
LEFT JOIN Field ON Teacher.FieldID=Field.FieldID --might be NULL
LEFT JOIN Category AS catField ON Field.CategoryID=catField.CategoryID
This was the answer before the EDIT:
I try to help you even if the concept is not absolutely clear to me
Teacher-Table: TeacherID, person's data (name, address...), ...
Category-Table: CategoryID, category title, ...
Field-Tabls: FieldID, field title, ...
You say, that fields are bound to a category in all cases. If this is the same category in all cases, you should set the category as a FK-column in the Field-Table. If there is the slightest chance, that a field's category could differ with the context, you should not...
Same with teachers: If a teacher is ever bound to one single category set a FK-column within the Teacher-table, otherwise don't.
The most flexible you'll be with at least one mapping table:
(SQL Server Syntax)
CREATE TABLE TeacherFieldCategory
(
--A primary key to identify this row. This is not needed actually, but it will serve as clustered key index as a lookup index...
TeacherFieldCategoryID INT IDENTITY NOT NULL CONSTRAINT PK_TeacherFieldCategory PRIMARY KEY
--Must be set
,TeacherID INT NOT NULL CONSTRAINT FK_TeacherFieldCategory_TeacherID FOREIGN KEY REFERENCES Teacher(TeacherID)
--Field may be left NULL
,FieldID INT NULL CONSTRAINT FK_TeacherFieldCategory_FieldID FOREIGN KEY REFERENCES Field(FieldID)
--Must be set. This makes sure, that a teacher ever has a category and - if the field is set - the field will have a category
,CategoryID INT NOT NULL CONSTRAINT FK_TeacherFieldCategory_CategoryID FOREING KEY REFERENCES Category(CategoryID)
);
--This unique index will ensure, that each combination will exist only once.
CREATE UNIQUE INDEX IX_TeacherFieldCategory_UniqueCombination ON TeacherFieldCategory(TeacherID,FieldID,CategoryID);
It could be a better concept to have a mapping table FieldCategory and this table mapped to the mapping table above through a foreign key. Doing so you could avoid invalid field-category combinations.
Hope this helps...

Related

SQL - NULL foreign key

Please have a look at the database design below:
create table Person (id int identity, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
Currently all persons have an invoiceID i.e. the InvoiceID is not null.
I want to extend the database so that some person does not have an Invoice. The original developer hated nulls and never uses them. I want to be consistent so I am wondering if there are other patterns I can use to extend the database to meet this requirement. How can this be approached without using nulls?
Please note that the two tables above are for illustration purposes. They are not the actual tables.
NULL is a very important feature in databases and programming in general. It is significantly different from being zero or any other value. It is most commonly used to signify absence of value (though it also can mean unknown value, but that's less used as the interpretation). If some people do not have an invoice, then you should truly allow NULL, as that matches your desired Schema
A common pattern would be to store that association in a separate table.
Person: Id
Invoice: Id
Assoc: person_id, assoc_id
Then if a person doesn't have an invoice, you simply don't have a row. This approach also allows a person to have more than one invoice id which might make sense.
The only way to represent the optional relationship while avoiding nulls is to use another table, as some other answers have suggested. Then the absence of a row for a given Person indicates the person has no Invoice. You can enforce a 1:1 relationship between this table and the Person table by making person_id be the primary or unique key:
CREATE TABLE PersonInvoice (
person_id INT NOT NULL PRIMARY KEY,
invoice_id INT NOT NULL,
FOREIGN KEY (person_id) REFERENCES Person(id),
FOREIGN KEY (invoice_id) REFERENCES Invoice(id)
);
If you want to permit each person to have multiple invoices, you can declare the primary key as the pair of columns instead.
But this solution is to meet your requirement to avoid NULL. This is an artificial requirement. NULL has a legitimate place in a data model.
Some relational database theorists like Chris Date eschew NULL, explaining that the existence of NULL leads to some troubling logical anomalies in relational logic. For this camp, the absence of a row as shown above is a better way to represent missing data.
But other theorists, including E. F. Codd who wrote the seminal paper on relational theory, acknowledged the importance of a placeholder that means either "not known" or "not applicable." Codd even proposed in a 1990 book that SQL needed two placeholders, one for "missing but applicable" (i.e. unknown), and the other for "missing but inapplicable."
To me, the anomalies we see when using NULL in certain ways are like the undefined results we see in arithmetic when we divide by zero. The solution is: don't do that.
But certainly we shouldn't use any non-NULL value like 0 or '' (empty string) to represent missing data. And likewise we shouldn't use a NULL as if it were an ordinary scalar value.
I wrote more about NULL in a chapter titled "Fear of the Unknown" in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
You need to move the invoice/person relation to another table.
You end up with
create table Person (id int person_identity)
create table PersonInvoice (id int person_id, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
You need this for some databases to allow in InvoiceId to be a foreign key as some do not allow NULLS in a foreign key.
If a person only can have one invoice then PersonInvoice can have a unique constraint on the person_id as well as the two columns together. You can also enforce having a single person for a invoice by adding a unique constraint to the invoiceID field.

Is the following acceptable foreign key usage

I have the following database, the first table users is a table containing my users, userid is a primary key.
The next is my results table, now for each user, there can be a result with an id and it can be against an exam. Is it ok in this scenario to use "id" as a primary key and "userid" as a foreign key? Is there a better way I could model this scenario?
These then link to the corresponding exams...
I would probably not have userid as a varchar. I would have that as an int as well.
So the user table is like this:
userId int
userName varchar
firstName varchar
lastName varchar
And then the forenkey in the results table table would be an int. Like this:
userId int
result varchar
id int
examid INT
Becuase if you are plaing on JOIN ing the tables together then JOIN ing on a varchar is not as fast as JOIN ing on a INT
EDIT
That depend on how much data you are planing to store. Beause you know that there is a minimum chans that GUIDs are not unique. Simple proof that GUID is not unique. I think if I would design this database I would go with an int. Becuase it feels a little bit overkill to use a GUID as a userid
Provided that each user/exam will only ever produce one result, then you could create a composite key using the userid and exam columns in the results table.
Personally though, i'd go with the arbitrary id field approach as I don't like having to pass in several values to reference records. But that's just me :).
Also, the exam field in the results table should also be a foreign key.
Another way of doing this could be to abstract the Grade Levels from the Exam, and make the Exam a unique entity (and primary key) on its own table. So this would make a Grade Levels table (pkey1 = A, pkey2 = B, etc) where the grade acts as the foreign key in your second table, thus removing an entire field.
You could also normal out another level and make a table for Subjects, which would be the foreign key for a dedicated Exam Code table. You can have ENG101, ENG102, etc for exams, and the same for the other exam codes for the subject. The benefit of this is to maintain your exams, subjects, students and grade levels as unique entities. The primary and foreign keys of each are evident, and you keep a simple maintenance future with room to scale up.
You could consider using composite keys, but this is a nice and simple way to start, and you can merge tables for indexing and compacting as required.
Please make sure you first understand Normal Forms before actually normalizing your schema.

SQL One-to-One Relationship Definition

I'm designing a database and I'm not sure how to define one of the relationships. Here's the situation:
An invoice is created
If the product is not in stock then it needs to be manufactured and so a work order is created.
The relationship is one-to-one. However work orders are sometimes created for other purposes so the WorkOrder table will also be linked to other tables in a similar one-to-one relationship. Also, some Invoices won't have a work order at all. This means I can't define these relationships in the normal way by using the same primary key in both tables. Instead of doing this I've created a linking table and then set unique indexes on both fields to define the one-to-one relationship (see image).
(source: markevans.org)
.
Is this the best way?
Cheers
Mark
EDIT: I just realised that this design will allow a single work order to be linked to an invoice and also to one of the other tables I mentioned via 2 linking tables. I guess no solution is perfect.
Okay, this answer is SQL Server specific, but should be adaptable to other RDBMSs, with a little work. So far as I see, we have the following constraints:
An invoice may be associated with 0 or 1 Work Orders
A Work Order must be associated with an invoice or an ABC or a DEF
I'd design the WorkOrder table as follows:
CREATE TABLE WorkOrder (
WorkOrderID int IDENTITY(1,1) not null,
/* Other Columns */
InvoiceID int null,
ABCID int null,
DEFID int null,
/* Etc for other possible links */
constraint PK_WorkOrder PRIMARY KEY (WorkOrderID),
constraint FK_WorkOrder_Invoices FOREIGN KEY (InvoiceID) references Invoice (InvoiceID),
constraint FK_WorkOrder_ABC FOREIGN KEY (ABCID) references ABC (ABCID),
/* Etc for other FKs */
constraint CK_WorkOrders_SingleFK CHECK (
CASE WHEN InvoiceID is null THEN 0 ELSE 1 END +
CASE WHEN ABCID is null THEN 0 ELSE 1 END +
CASE WHEN DEFID is null THEN 0 ELSE 1 END
/* + other FK columns */
= 1
)
)
So, basically, this table is constrained to only FK to one other table, no matter how many PKs are defined. If necessary, a computed column could tell you the "Type" of item that this is linked to, based on which FK column is non-null, or the type and a single int column could be real columns, and InvoiceID, ABCID, etc could be computed columns.
The final thing to ensure is that an invoice only has 0 or 1 Work Orders. If your RDMBS ignores nulls in unique constraints, this is as simple as applying such a constraint to each FK column. For SQL Server, you need to use a filtered index (>=2008) or an indexed view (<=2005). I'll just show the filtered index:
CREATE UNIQUE INDEX IX_WorkItems_UniqueInvoices on
WorkItem (InvoiceID) where (InvoiceID is not null)
Another way to deal with keeping WorkOrders straight is to include a WorkOrder type column in WorkOrder (e.g. 'Invoice','ABC','DEF'), including a computed or column constrained by check constraint to contain the matching value in the link table, and introduce a second foreign key:
CREATE TABLE WorkOrder (
WorkOrderID int IDENTITY(1,1) not null,
Type varchar(10) not null,
constraint PK_WorkOrder PRIMARY KEY (WorkOrderID),
constraint UQ_WorkOrder_TypeCheck UNIQUE (WorkOrderID,Type),
constraint CK_WorkOrder_Types CHECK (Type in ('INVOICE','ABC','DEF'))
)
CREATE TABLE Invoice_WorkOrder (
InvoiceID int not null,
WorkOrderID int not null,
Type varchar(10) not null default 'INVOICE',
constraint PK_Invoice_WorkOrder PRIMARY KEY (InvoiceID),
constraint UQ_Invoice_WorkOrder_OrderIDs UNIQUE (WorkOrderID),
constraint FK_Invoice_WorkOrder_Invoice FOREIGN KEY (InvoiceID) references Invoice (InvoiceID),
constraint FK_Invoice_WorkOrder_WorkOrder FOREIGN KEY (WorkOrderID) references WorkOrder (WorkOrderID),
constraint FK_Invoice_WorkOrder_TypeCheck FOREIGN KEY (WorkOrderID,Type) references WorkOrder (WorkOrderID,Type),
constraint CK_Invoice_WorkOrder_Type CHECK (Type = 'INVOICE')
)
The only disadvantage to this model, although closer to your original proposal, is that you can have a work order that isn't actually linked to any other item (although it claims to be for an e.g INVOICE).
What you have looks to be a perfectly normal way to construct your tables.
If you think you might like to use only one link table between your WorkOrder table and whatever other tables that may have WorkOrders, you could use a link table like:
WorkOrders
OtherId (Could be InvoiceId, or an ID for SomethingElse that may have a WorkOrder)
OtherType (ENUM - something like 'Invoice', 'SomethingElse')
WorkOrderId
So the issue is that you can have invoices that don't have work orders and work orders that don't have invoices but the two need to be linked when there is a link. I would say based upon that description that your database diagram is pretty good. This would open you up to allowing more than a one-to-one relationship. This way down the road you can consider having two work orders for one invoice. You might also have one work order that handles two invoices. This opens you up to a lot of possibilities that you may not need now but that you might in the future.
I would recommend your current design. In the future, you may want to add more information about the link between invoice and work order. This middle table will allow you to add this information.
In the interest of fairness to the other side of the coin, you do need to consider speed/number of tables/etc. that this will cause. For example, you have now created a third table which increased your table count by 50% in this example. Look at the rest of your database. If you did this everywhere, you would probably have the most normalized database but it might not be the most performant because of all the joins that are necessary. Basically, this isn't a "one-size-fits-all" solution. Instead it is a design choice. Personally, I hate nullable foreign key fields. I find they don't give me the granularity I usually want with my database designs.
Your schema corresponds to a many-to-many link between the 2 tables. You are de facto opening here the possibility to have one work order for multiple invoices, and multiple work orders for one invoice. The model offers then possibilities far above the rules you are setting.
You could use a simpler schema, that will reflect the (0,1) relation between work orders and invoices, and the (0,1) relation between Invoices and Work orders:
a Work Order can be independant from
an invoice, or linked to one specific
invoice: it has a (0,1) relation to Invoice table
An invoice can have no work orders, or one work orders: it has a (0,1) relation to Work Orders Table
Such a relation can be translated by the following model and rules
Invoice
id_Invoice, Primary Key
WorkOrder
id_WorkOrder, Primary Key
id_Invoice, Foreign Key, Nulls accepted, unique value
With such a structure, it will be easy to add new 'dependants' to work orders table. If, for example, you want to open the possibility to launch work orders from restocking orders (where you want to have minimal quantities of some items in stock), you can then just add the corresponding field to the WorkOrder table:
id_RestockingOrder, ForeignKey, Nulls accepted, unique value
You'll be then able to 'see' from where your WorkOrder comes: an invoice, a restocking order, etc.
Seems it corresponds to your needs.
Edit:
as noted by #mark, SQL Server will not allow multiple null values, in contradiction with ANSI specs (check here for some more details), As we do not want to wait for SQL Server 2011 to have this rule implemented, there is a workaround here, where you can build a view excluding the null values and set a unique index on this view. I must admit that I did not like this solution ...
There is still the possibility to implement the 'unique if not null' rule in your code. It will still be simpler than implementing the many-to-many model (with the Invoice_WorkOrder table) you are proposing and manage all additional unicity rules that you'll need to implement.
There is no real need for the link table, just have them linked directly and allow for NULL in the reference field of the work order. Because a work order can be linked to multiple tables what I would do is add a reference id on every work order to every table that can link from it. So you would have:
Invoice
PK - ID
FK - WorkOrderID
SomeOtherTable
PK - ID
FK - WorkOrderID
WorkOrder
PK - ID
FK - InvoiceID (allow NULL)
FK - SomeOtherTableID (allow NULL)
To make sure a WorkOrder is linked to only one item, you have to use code to validate the row (or perhaps a stored procedure which I cannot come up with right now).
EDIT: PS, if you want to use a link table, give it a generic name and add all the linked tables with the same sort of construct I just described allowing for NULL's. In my eyes adding the extra table makes the schema larger than it needs to be, but if a work order contains a lot of big text fields it could increase performance slightly and reduce database size with all the indexes flying around. In anything but the largest applications, I would consider it over-normalization though, but that is a matter of style.

Can I have 2 unique columns in the same table?

I have 2 tables:
roomtypes[id(PK),name,maxAdults...]
features(example: Internet in room, satelite tv)
Can both id and name field be unique in the same table in mysql MYISAM?
If the above is posible, I am thinking of changing the table to:
features[id(PK),name,roomtypeID] ==> features[id(PK),name,roomtypeNAME]
...because it is helping me not to do extra querying in presentation for features because the front end users can't handle with IDs.
Of course, you can make one of them PRIMARY and one UNIQUE. Or both UNIQUE. Or one PRIMARY and four UNIQUEs, if you like
Yes, you can define UNIQUE constraints to columns other than the primary key in order to ensure the data is unique between rows. This means that the value can only exist in that column once - any attempts to add duplicates will result in a unique constraint violation error.
I am thinking of changing the FEATURES table to features[id(PK), name, roomtypeNAME] because it is helping me not to do extra querying in presentation for features because the front end users can't handle with IDs.
There's two problems:
A unique constraint on the ROOM_TYPE_NAME wouldn't work - you'll have multiple instances of a given room type, and a unique constraint is designed to stop that.
Because of not using a foreign key to the ROOM_TYPES table, you risk getting values like "Double", "double", "dOUBle"
I recommend sticking with your original design for sake of your data; your application is what translates a room type into its respective ROOM_TYPE record while the UI makes it presentable.
I would hope so otherwise MySQL is not compliant with the SQL standard. You can only have one primary key but you can mark other columns as unique.
In SQL, this is achieved with:
create table tbl (
colpk char(10) primary key,
coluniq char(10) unique,
colother char(10)
);
There are other ways to do it (particularly with multi-part keys) but this is a simple solution.
Yes you can.
Also keep in mind that MySQL allow NULL values in unique columns, whereas a column that is a primary key cannot have a NULL value.
1 RoomType may have many Features
1 Feature may be assigned to many RoomTypes
So what type of relationship do i have? M:N ?
You have there a many-to-many relationship, which has to be represented by an extra table.
That relationship table will have 2 fields: the PK of RoomTypes and the PK of Features.
The PK of the relationship table will be made of those 2 fields.
If that's usefull, you can add extra fields like the Quantity.
I would like to encourage you to read about database Normalization, which is he process of creating a correct design for a relational database. You can Google for that, or look eventually here (there are plenty of books/web pages on this)
Thanks again for very helpful answers.
1 roomType may have many features
1 feature may be assigned to many roomTypes
So what type of relationship do i have? M:N ?
If yes the solution I see is changing table structure to
roomTypes[id,...,featuresIDs]
features[id(PK),name,roomtypeIDs] multiple roomTypesIDs separated with comma?

How do I implement this multi-table database design/constraint, normalized?

I have data that kinda looks like this...
Elements
Class | Synthetic ID (pk)
A | 2
A | 3
B | 4
B | 5
C | 6
C | 7
Elements_Xref
ID (pk) | Synthetic ID | Real ID (fk)
. | 2 | 77-8F <--- A class
. | 3 | 30-7D <--- A class
. | 6 | 21-2A <--- C class
. | 7 | 30-7D <--- C class
So I have these elements that are assigned synthetic IDs and are grouped into classes. But these synthetic IDs are then paired with Real IDs that we actually care about. There is also a constraint that a Real ID cannot recur in a single class. How can I capture all of this in one coherent design?
I don't want to jam the Real ID into the upper table because
It is nullable (there are periods where we don't know what the Real ID of something should be).
It's a foreign key to more data.
Obviously this could be done with triggers acting as constraints, but I'm wondering if this could be implemented with regular constraints/unique indexes. Using SQL Server 2005.
I've thought about having two main tables SyntheticByClass and RealByClass and then putting IDs of those tables into another xref/link table, but that still doesn't guarantee that the classes of both elements match. Also solvable via trigger.
Edit: This is keyword stuffing but I think it has to do with normalization.
Edit^2: As indicated in the comments below, I seem to have implied that foreign keys cannot be nullable. Which is false, they can! But what cannot be done is setting a unique index on fields where NULLs repeat. Although unique indexes support NULL values, they cannot constraint more than one NULL in a set. Since the Real ID assignment is initially sparse, multiple NULL Real IDs per class is more than likely.
Edit^3: Dropped the redundant Elements.ID column.
Edit^4: General observations. There seems to be three major approaches at work, one of which I already mentioned.
Triggers. Use a trigger as a constraint to break any data operations that would corrupt the integrity of the data.
Index a view that joins the tables. Fantastic, I had no idea you could do that with views and indexes.
Create a multi-column foreign key. Didn't think of doing this, didn't know it was possible. Add the Class field to the Xref table. Create a UNIQUE constraint on (Class + Real ID) and a foreign key constraint on (Class + Synthetic ID) back to the Elements table.
Comments from before the question was made into a 'bonus' question
What you'd like to be able to do is express that the join of Elements and Elements_Xref has a unique constraint on Class and Real ID. If you had a DBMS that supported SQL-92 ASSERTION constraints, you could do it.
AFAIK, no DBMS supports them, so you are stuck with using triggers.
It seems odd that the design does not constrain Real ID to be unique across classes; from the discussion, it seems that a given Real ID could be part of several different classes. Were the Real ID 'unique unless null', then you would be able to enforce the uniqueness more easily, if the DBMS supported the 'unique unless null' concept (most don't; I believe there is one that does, but I forget which it is).
Comments before edits made 2010-02-08
The question rules out 'jamming' the Real_ID in the upper table (Elements); it doesn't rule out including the Class in the lower table (Elements_Xref), which then allows you to create a unique index on Class and Real_ID in Elements_Xref, achieving (I believe) the required result.
It isn't clear from the sample data whether the synthetic ID in the Elements table is unique or whether it can repeat with different classes (or, indeed whether a synthetic ID can be repeated in a single class). Given that there seems to be an ID column (which presumably is unique) as well as the Synthetic ID column, it seems reasonable to suppose that sometimes the synthetic ID repeats - otherwise there are two unique columns in the table for no very good reason. For the most part, it doesn't matter - but it does affect the uniqueness constraint if the class is copied to the Elements_Xref table. One more possibility; maybe the Class is not needed in the Elements table at all; it should live only in the Elements_Xref table. We don't have enough information to tell whether this is a possibility.
Comments for changes made 2010-02-08
Now that the Elements table has the Synthetic ID as the primary key, things are somewhat easier. There's a comment that the 'Class' information actually is a 'month', but I'll try to ignore that.
In the Elements_Xref table, we have an unique ID column, and then a Synthetic ID (which is not marked as a foreign key to Elements, but presumably must actually be one), and the Real ID. We can see from the sample data that more than one Synthetic ID can map to a given Real ID. It is not clear why the Elements_Xref table has both the ID column and the Synthetic ID column.
We do not know whether a single Synthetic ID can only map to a single Real ID or whether it can map to several Real ID values.
Since the Synthetic ID is the primary key of Elements, we know that a single Synthetic ID corresponds to a single Class.
We don't know whether the mapping of Synthetic ID to Real ID varies over time (it might as Class is date-related), and whether the old state has to be remembered.
We can assume that the tables are reduced to the bare minimum and that there are other columns in each table, the contents of which are not directly material to the question.
The problem states that the Real ID is a foreign key to other data and can be NULL.
I can't see a perfectly non-redundant design that works.
I think that the Elements_Xref table should contain:
Synthetic ID
Class
Real ID
with (Synthetic ID, Class) as a 'foreign key' referencing Elements, and a NOT NULL constraint on Real ID, and a unique constraint on (Class, Real ID).
The Elements_Xref table only contains rows for which the Real ID is known - and correctly enforces the uniqueness constraint that is needed.
The weird bit is that the (Synthetic ID, Class) data in Elements_Xref must match the same columns in Elements, even though the Synthetic ID is the primary key of Elements.
In IBM Informix Dynamic Server, you can achieve this:
CREATE TABLE elements
(
class CHAR(1) NOT NULL,
synthetic_id SERIAL NOT NULL PRIMARY KEY,
UNIQUE(class, synthetic_id)
);
CREATE TABLE elements_xref
(
class CHAR(1) NOT NULL,
synthetic_id INTEGER NOT NULL REFERENCES elements(synthetic_id),
FOREIGN KEY (class, synthetic_id) REFERENCES elements(class, synthetic_id),
real_id CHAR(5) NOT NULL,
PRIMARY KEY (class, real_id)
);
I would:
Create a UNIQUE constraint on Elements(Synthetic ID, Class)
Add Class column to Elements_Xref
Add a FOREIGN KEY constraint on Elements_Xref table, referring to (Synthetic ID, Class)
At this point we know for sure that Elements_Xref.Class always matches Elements.Class.
Now we need to implement "unique when not null" logic. Follow the link and scroll to section "Use Computed Columns to Implement Complex Business Rules":
Indexes on Computed Columns: Speed Up Queries, Add Business Rules
Alternatively, you can create an indexed view on (Class, RealID) with WHERE RealID IS NOT NULL in its WHERE clause - that will also enforce "unique when not null" logic.
Create an indexed view for Elements_Xref with Where Real_Id Is Not Null and then create a unique index on that view
Create View Elements_Xref_View With SchemaBinding As
Select Elements.Class, Elements_Xref.Real_Id
From Elements_Xref
Inner Join Element On Elements.Synthetic_Id = Elements_Xref.Synthetic_Id
Where Real_Id Is Not Null
Go
Create Unique Clustered Index Elements_Xref_Unique_Index
On Elements_Xref_View (Class, Real_Id)
Go
This serves no other purpose other than simulating a unique index that treats nulls properly i.e. null != null
You can
Create a view from the the result set of joining Elements_Xref and Elements together on Synthetic ID
add a unique constraint on class, and [Real ID]. In other news, this is also how you do functional indexes in MSSQL, by indexing views.
Here is some sql:
CREATE VIEW unique_const_view AS
SELECT e.[Synthetic ID], e.Class, x.[Real ID]
FROM Elements AS e
JOIN [Elements_Xref] AS x
ON e.[Synthetic ID] = x.[Synthetic ID]
CREATE UNIQUE INDEX unique_const_view_index ON unique_const_view ( Class, [Real ID] );
Now, apparently, unbeknownst to myself this solution doesn't work in Microsoft-land-place because with MS SQL Server duplicate nulls will violate a UNIQUE constraint: this is against the SQL spec. This is where the problem is discussed about.
This is the Microsoft workaround:
create unique nonclustered index idx on dbo.DimCustomer(emailAddress)
where EmailAddress is not null;
Not sure if that is 2005, or just 2008.
I think a trigger is your best option. Constraints can't cross to other tables to get information. Same thing with a unique index (although I suppose a materialized view with an index might be possible), they are unique within the table. When you put the trigger together, remember to do it in a set-based fashion not row-by-row and test with a multi-row insert and multi-row update where the real key is repeated in the dataset.
I don't think either of your two reasons are an obstacle to putting Real ID in Elements. If a given element has 0 or 1 Real IDs (but never more than 1), it should absolutely be in the Elements table. This would then allow you to constrain uniqueness within Class (I think).
Could you expand on your two reasons not to do this?
Create a new table real_elements with fields Real ID, Class and Synthetic ID with a primary key of Class, RealId and add elements when you actually add a RealID
This constrains Real IDs to be unique for a class and gives you a way to match a class and real ID to the synthetic ID
As for Real ID being a foreign key do you mean that if it is in two classes then the data keyed off it will be the same. If so the add another table with key Real Id. This key is then a foreign key into real_elements and any other table needing real ID as foreign key