Relational Data: entity inheritance approaches. Best practice - sql

There are several approaches how to store entities hierarchy in relation database
For example there is person entity (20 basic attributes), student entity (the same as person but several new specific fields are present), employee (the same as person but some new fields are present) e.t.c.
When you advice to use (and not to use) the following data modeling approaches:
One big table with all possible fields + personType marker field (student or employee)
Table inheritance
One Table with XML field (or maybe another data type) to store all the custom fields
Something else but also relational...
Thank you in advance!

A database models facts, not objects, and each table should model a relatively self-contained set of facts. The consequence of this is that your tables should look something like this:
person { person_id PK, name, dob, ... }
student { person_id PK FK(person.person_id), admission_id, year_started, ... }
employee { person_id PK FK(person.person_id), salary_bracket, ... }
An additional consequence is that a student can also be an employee, which probably models real life closer than an inheritance graph would.

Have a look at the hibernate inheritance mapping docs. There you find three common approaches and a list of pros and cons of each.

If you are using an ORM to implement your classes, the ORM tools you are using will provide you options, generally two options, one class one table or one parent class one table and each table for each children class. I am using XPO from Devexpress.com, one ORM framework. It offers these two options.
If you use ORM, I am afraid there are no other generic options.
Ying

Related

Mapping Variable Entity types in M:N relationship via EntityName table

I often see "linking" tables for M:N relationship, where N can be 1..X types of entities/classes, so that the table contains classNameId referring to ClassName table and classPK referring to the particular Entity table.
How is this called ? Does it have an alternative with the same effect without having the EntityName table ?
In the ER model, entities and subentities can be related by inheritance, the same way classes and subclasses are in an object model. The problem comes up when you transform your ER model into a relational model. The relational model does not support inheritance as such.
The design pattern is called is called generalization-specialization or gen-spec for short. Unfortunately, many database tutorials skip over how to design tables for a gen-spec situation.
But it's well understood.It looks quite different from your model, but you could create views that make it look like your model if necessary. Look up "generalization specialization relational modeling" for explanations of how to do this.
The main trick is that the specialized tables "inherit" the value of their primary key from the PK of the generalized table. The meaning of "inherit" here is that it's a copy of the same value. Thus the PK in each specialized table is also an FK link back to the correpsonding entry in the generalized table.

ORM and many-to-many relationships

This is more or less a general question and not about any specific ORM or language in particular: this question comes up regardless of your ORM preference.
When mapping a many-to-many relationship it is possible to obscure the intermediary table or to make the intermediary table a part of your model. In the case that the intermediary table has valuable data beyond the relationship, how do you handle the mapping?
Consider the following tables:
CaseWorker (id, first_name, last_name)
CaseWorkerCases (case_worker_id, case_id, date_opened, date_closed)
Case (id, client_id, field_a, field_b)
As a programmer I would really rather be able to do:
CaseWorker.Cases
than
CaseWorker.CaseWorkerCases.Cases
On the one hand, the table CaseWorkerCases contains useful data and hiding the intermediary table makes accessing that data less than convenient. On the other, having to navigate through the intermediary table makes the common task of accessing Cases seem awkward.
I supose one solution could be to expose the intermediate table in the model and then give the CaseWork object a wrapper property could work. Something like:
public IEnumerable<Case> Cases
{
get{return (from caseWorkerCase in this.CaseWorkerCases
select caseWorkerCase.Case);}
}
But that also seems wrong.
I regard many-to-many mappings as just a notational abbreviation for two one-to-many mappings with the intermediate table, as you call it, enabling simplification of the relationships. It only works where the relationships do not have attributes of their own. However, as understanding of the particular domain improves, I usually find that many-to-many mappings usually need to be broken down to allow attributes to be attached. So my usual approach these days is to always simply use one-to-many mappings to start with.
I don't think your workaround is wrong. The complexities of these models have to be coded somewhere.
I have a blog post about this exact topic: Many-to-many relationships with properties

Entity Framework : Table per Concrete Type and unique IDs across tables

I have a few tables that share only a few navigation properties and an ID.
I think Table per Concrete type inheritance would be interesting here.. (?)
It looks something like this :
Contact (Base, Abstract, not mapped)
- ContactID
- navigation properties to other tables (email, phone, ..)
Person : Contact (mapped to table Person with various properties + ContactID)
- various properties
Company : Contact (mapped to table Company with various properties + ContactID)
- various properties
Now for this to work, the primary key (contactID) should be unique across all tables.
2 options then:
- GUIDs (not a fan)
- an additional DB table generating identities (with just a ContactID field, deriving tables have FK), this would not be mapped in EF.
Is this setup doable ?
Also, what will happen in the ObjectContext ? What kind of temporary key does EF generate before calling SaveChanges ? Will it be unique across objects ?
Thanks for any thoughts.
mike.
We use a similiar construction with the folowing db design:
ContactEntity
ID
ContactPossibility
ID
Position
ContactTypeID
ContactEntityID
Address
ID (=PK and FK to ContactPossibility.ID)
Street
etc.
Telephone
ID (=PK and FK to ContactPossibility.ID)
Number
etc.
Person
ID (=PK and FK to ContactEntity.ID)
FirstName
etc.
Company
ID (=PK and FK to ContactEntity.ID)
Name
etc.
This results in the entity model in two abstract classes: ContactEntity (CE) & ContactPossibility (CP) and multiple derived classes (Address=CP, Email=CP, Person=CE, Company=CE). The abstract and derived classes (rows in the db ;) share the same unique identifier, because we use an ID field in derived classes that's a foreign key to the primary key of the abstract class. And we use Guid's for this, because our software has the requirement to function properly off-line (not connected to the main database) and we have to deal smoothly with synchronisation issues. Also, what's the problem with Guid's?
Entity Framework does support this db / class design very good and we have a lot of pleasure from this design.
Is this setup doable ?
Also, what will happen in the ObjectContext ?
What kind of temporary key does EF generate before calling SaveChanges ?
Will it be unique across objects ?
The proposed setup is very very doable!
The ObjectContext acts fine and will insert, update and delete the right tables for derived classes without effort. Temporary keys? You don't need them if you use the pattern of an ID for derived classes that is both primary key and foreign key to the abstract class. And with Guid's you can be pretty sure that's unique across objetcs.
Furthermore: The foreignKey from CP to CE will provide every CE (Person, Company, User, etc.) with a trackable collection of ContactPossibilities. Which is real cool and handy.
Hope this helps...
(not enough space in the comments section)
I've been running some tests.
The thing is you're OK as long as you ONLY specify the subtype you're querying for (ex. 'Address' in your case).
But if you query for the base type (even if you don't need the subtypes info), ex. only ContactPossibility.ID, the generated SQL will UNION all subtype tables.
So querying your 'trackable' collection of ContactPossibilities can create a performance problem.
I tried to work around this by unmapping the base entity and split the inherited entities to their own table + the common table, basically transforming the TPT into TPC : this worked fine from a conceptual perspective (after a lot of edmx editing). Until I realized this was stupid... :) Indeed in that case you will always need to Union all underlying tables to query for the common data...
(Though I'm not sure in the case described at the end of this post, didn't pursue to test it)
So I guess, since mostly I will need to query for a specific type (person, company, address, phone,..), it's gonna be OK for now and hoping MS will come with a fix in EF4.5.
So I'll have to be careful when querying, another interesting example :
Let's say you want to select a person and then query for his address, something like (tried to follow your naming) :
var person = from b in context.ContactEntities.OfType-Person-()
where b.FirstName.StartsWith("X")
select b;
var address = from a in context.ContactPossibilities.OfType-Address-()
where **a.ContactEntity == person.FirstOrDefault()**
select a;
this will produce a Union between all the tables of the Contact derived entities, and performance issues : generated SQL takes ContactPossibility table and joins to Address on ContactPossibilityID, then joins a union of all Contact derived tables joined with the base Contact table, before finally joining a filtered Person table.
However, consider the following alternative :
var person = from b in context.ContactEntities.OfType-Person-()
where b.FirstName.StartsWith("X")<BR>
select b;
var address = from a in context.ContactPossibilities.OfType-Address-()
where **a.ContactID == person.FirstOrDefault().ID**
select a;
This will work fine : generated SQL takes ContactPossibility table and joins to Address on ContactPossibilityID, and then joins the filtered Person table.
Mike.

ORM question - JPA

I'm reading Pro JPA 2. The book talks begins by talking about ORM in the first few pages.
It talks about mapping a single Java class named Employee with the following instance variables - id,name,startDate, salary.
It then goes on to the issue of how this class can be represented in a relational database and suggests the following scheme.
table A: emp
id - primary key
startDate
table B: emp_sal
id - primary key in this table, which is also a foreign key referencing the 'id' column in table A.
It thus seems to suggest that persisting an Employee instance to the database would require operations on two(multiple) tables.
Should the Employee class have an instance variable 'salary' in the first place?
I think it should possibly belong to a separate class (Class salary maybe?) representing salary and thus the example doesn't seem very intuitive.
What am I missing here?
First, the author explains that there are multiples ways to represent a class in a database: sometimes the mapping of a class to a table is straightforward, sometimes you don't have a direct correspondence between attributes and columns, sometimes a single class is represented by multiples tables:
In scenario (C), the EMP table has
been split so that the salary
information is stored in a separate
EMP_SAL table. This allows the
database administrator to restrict
SELECT access on salary information to
those users who genuinely require it.
With such a mapping, even a single
store operation for the Employee class
now requires inserts or updates to two
different tables.
So even storing the data from a single class in a database can be a challenging exercise.
Then, he describes how relationships are different. At the object level model, you traverse objects via their relations. At the relational model level, you use foreign keys and joins (sometimes via a join table that doesn't even exist at the object model level).
Inheritance is another "problem" and can be "simulated" in various ways at the relational model level: you can map an entire hierarchy into a single table, you can map each concrete class to its own table, you can map each class to its own table.
In other words, there is no direct and unique correspondence between an object model and a relational model. Both rely on different paradigms and the fit is not perfect. The difference between both is known as the impedance mismatch, which is something ORM have to deal with (allowing the mapping between an object model and the many possible representations in a relation model). And this is what the whole section you're reading is about. This is also what you missed :)

How to model a mutually exclusive relationship in SQL Server

I have to add functionality to an existing application and I've run into a data situation that I'm not sure how to model. I am being restricted to the creation of new tables and code. If I need to alter the existing structure I think my client may reject the proposal.. although if its the only way to get it right this is what I will have to do.
I have an Item table that can me link to any number of tables, and these tables may increase over time. The Item can only me linked to one other table, but the record in the other table may have many items linked to it.
Examples of the tables/entities being linked to are Person, Vehicle, Building, Office. These are all separate tables.
Example of Items are Pen, Stapler, Cushion, Tyre, A4 Paper, Plastic Bag, Poster, Decoration"
For instance a Poster may be allocated to a Person or Office or Building. In the future if they add a Conference Room table it may also be added to that.
My intital thoughts are:
Item
{
ID,
Name
}
LinkedItem
{
ItemID,
LinkedToTableName,
LinkedToID
}
The LinkedToTableName field will then allow me to identify the correct table to link to in my code.
I'm not overly happy with this solution, but I can't quite think of anything else. Please help! :)
Thanks!
It is not a good practice to store table names as column values. This is a bad hack.
There are two standard ways of doing what you are trying to do. The first is called single-table inheritance. This is easily understood by ORM tools but trades off some normalization. The idea is, that all of these entities - Person, Vehicle, whatever - are stored in the same table, often with several unused columns per entry, along with a discriminator field that identifies what type the entity is.
The discriminator field is usually an integer type, that is mapped to some enumeration in your code. It may also be a foreign key to some lookup table in your database, identifying which numbers correspond to which types (not table names, just descriptions).
The other way to do this is multiple-table inheritance, which is better for your database but not as easy to map in code. You do this by having a base table which defines some common properties of all the objects - perhaps just an ID and a name - and all of your "specific" tables (Person etc.) use the base ID as a unique foreign key (usually also the primary key).
In the first case, the exclusivity is implicit, since all entities are in one table. In the second case, the relationship is between the Item and the base entity ID, which also guarantees uniqueness.
Note that with multiple-table inheritance, you have a different problem - you can't guarantee that a base ID is used by exactly one inheritance table. It could be used by several, or not used at all. That is why multiple-table inheritance schemes usually also have a discriminator column, to identify which table is "expected." Again, this discriminator doesn't hold a table name, it holds a lookup value which the consumer may (or may not) use to determine which other table to join to.
Multiple-table inheritance is a closer match to your current schema, so I would recommend going with that unless you need to use this with Linq to SQL or a similar ORM.
See here for a good detailed tutorial: Implementing Table Inheritance in SQL Server.
Find something common to Person, Vehicle, Building, Office. For the lack of a better term I have used Entity. Then implement super-type/sub-type relationship between the Entity and its sub-types. Note that the EntityID is a PK and a FK in all sub-type tables. Now, you can link the Item table to the Entity (owner).
In this model, one item can belong to only one Entity; one Entity can have (own) many items.
your link table is ok.
the trouble you will have is that you will need to generate dynamic sql at runtime. parameterized sql does not typically allow the objects inthe FROM list to be parameters.
i fyou want to avoid this, you may be able to denormalize a little - say by creating a table to hold the id (assuming the ids are unique across the other tables) and the type_id representing which table is the source, and a generated description - e.g. the name value from the inital record.
you would trigger the creation of this denormalized list when the base info is modified, and you could use that for generalized queries - and then resort to your dynamic queries when needed at runtime.